Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Trying To Clone An Experiment. Using The Server Gui, I Select 'Clone' And Then 'Enqueue'. In The Console Window, I See That Clearml Makes Sure The Environment Is Installed, And Then It Goes Into A 'Completed' Status Although The Experiment Did N

Hi, I am trying to clone an experiment. Using the server GUI, I select 'clone' and then 'enqueue'. In the console window, I see that clearml makes sure the environment is installed, and then it goes into a 'completed' status although the experiment did not run.

  
  
Posted one year ago
Votes Newest

Answers 28


Yes, I create the experiment by calling Task.init.
As you suggested, in the experiment tab I define the script path and the working directory.
Again, the task only created the environment and after that reported 'completed' without running my code.
Attaching the log of the last run, with the setting of the script and the folder.

  
  
Posted one year ago

The only thing I need to do is clone my experiment. Can you help me make this happen?

  
  
Posted one year ago

I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.

Who/What created the initial experiment ?

I noticed that if I run the initial experiment by "python -m folder_name.script_name"

"-m module" as script entry is used to launch entry points like python modules (which is translated to "python -m script")
Why isn't the entry point just the python script?
The command line arguments are passed as arguments on the Args section of the Configuration section

  
  
Posted one year ago

Could you upload the log so I can have a look?

  
  
Posted one year ago

Could it be the file you are trying to run is not in the repository ?

It is unclear what file is missing. The only hint is "Keyerror: '.'" and I am not sure what that refers to. All my code files are in the repository. Maybe the problem is with some installed package file?

Are you running inside a docker ?

No, I am running inside a conda environment.

Any chance you can send the full log ? (edited)

What I sent is the full agent daemon log. If you are asking for the console output, then it is attached.

  
  
Posted one year ago

Who/What created the initial experiment ?

I created the initial experiment from command-line, with either "python folder/script.py" or "python -m folder.script".
Both end up with the experiment not running. I am attaching an agent daemon log where the initial experiment was called with "python folder/script.py".

Why isn't the entry point just the python script?

The entry point is folder.script and not just the script because I need the 'current' folder while running the script to be project root, so importing other packages in the project will work properly.

  
  
Posted one year ago

AgitatedDove14 , thank you so much for your help.
I had a long video session today with the Israeli clearml engineers. There were plenty of things I had to do, and the two major ones were to define the environment variable CLEARML_AGENT_SKIP_PIP_VENV_INSTALL so it points to my conda environment python, and to call 'import clearml' from the top of my file (it was called from inside a method).
So now I can clone 🎉

  
  
Posted one year ago

If you wan to change the Args, go to the Args section in the Configuration tab, when the Task is in draft mode you can edit them there

  
  
Posted one year ago

I see such arguments (--script, --cwd) in the command 'clearml-task', but I am not using it. What I do is run my script ('python folder/script.py') and create a task inside it, using Task.init().

  
  
Posted one year ago

As you said you just need to clone, righr click clone?

  
  
Posted one year ago

Oh I see, what you need is to pass '--script script.py' as entry-point and ' --cwd folder' as working dir

  
  
Posted one year ago

That's pretty weird. I don't see any clear indications something is wrong, it simply doesn't execute the rest it would seem. Did it successfully run the first time before cloning it?

  
  
Posted one year ago

Great if this is what you do how come you need to change the entry script in the ui?

  
  
Posted one year ago

As you suggested, I tried with a git repository. Got a completely different error. Attached is the log file. Any idea what's wrong?

  
  
Posted one year ago

TimelyMouse69 , yes, I ran successfully the first time before cloning it.

  
  
Posted one year ago

AgitatedDove14 , I noticed that if I run the initial experiment by "python -m folder_name.script_name" then the script path contains the whole list of arguments as you observed.
On the other hand, if I run the initial experiment by "python folder_name/script_name.py", then the script path contains only 'script_name.py'.
In both cases I cannot clone the experiment, with the same results as I reported in my initial message.

  
  
Posted one year ago

But the python command does not have such arguments (--script, --cwd). What am I missing?
Or, do you mean that those should be added to the Args list when cloning?

  
  
Posted one year ago

Are you saying you had that odd script entry-point created by calling Task.init? (To clarify this is the problem)
Btw after you clone the experiment you can always manually edit both entry point and working dir, which based on what you said should be "script.py" and "folder"

  
  
Posted one year ago

woot woot, glad to hear that!

  
  
Posted one year ago

Hi RotundSquirrel78
How did you end up with this command line?
/home/sigalr/.clearml/venvs-builds/3.8/code/unet_sindiff_1_level_2_resblk --dataset humanml --device 0 --arch unet --channel_mult 1 --num_res_blocks 2 --use_scale_shift_norm --use_checkpoint --num_steps 300000the arguments passed are odd (there should be none, they are passed inside the execution) and I suspect this is the issue

  
  
Posted one year ago

Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files

  
  
Posted one year ago

As written above, I did the right click clone, then I did right click enqueue.
The experiment reported 'running', and immediately after preparing the environment it reported 'completed', without actually running my code. Please look at the beginning of this thread for output logs and more details.

  
  
Posted one year ago

Bingo (I guess). My code is local, with multiple files. I will try to connect it to a git repo and let you know how it worked.
Does the agent support uncommitted changes in multiple files? (on-top of a git commit).

  
  
Posted one year ago

Yes it does, but these files must be committed to begin with, basically think 'git diff' output is stored and then the agent applies it

  
  
Posted one year ago

This seems to be the issue:
PYTHONPATH = '.'How is that happening ?
Can you try to run the agent with:
PYTHONPATH= clearml-agent daemon ....(Notice the prefix PYTHONPATH= clears the environment variable that obviously fails the python commands)

  
  
Posted one year ago

AgitatedDove14 , I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.

  
  
Posted one year ago

FileNotFoundError: [Errno 2] No such file or directoryCould it be the file you are trying to run is not in the repository ?
Are you running inside a docker ?
Any chance you can send the full log ?

  
  
Posted one year ago

Attached are the agent log and the task log

  
  
Posted one year ago