AgitatedDove14 , thank you so much for your help.
I had a long video session today with the Israeli clearml engineers. There were plenty of things I had to do, and the two major ones were to define the environment variable CLEARML_AGENT_SKIP_PIP_VENV_INSTALL so it points to my conda environment python, and to call 'import clearml' from the top of my file (it was called from inside a method).
So now I can clone 🎉
This seems to be the issue:PYTHONPATH = '.'
How is that happening ?
Can you try to run the agent with:PYTHONPATH= clearml-agent daemon ....
(Notice the prefix PYTHONPATH=
clears the environment variable that obviously fails the python commands)
Could it be the file you are trying to run is not in the repository ?
It is unclear what file is missing. The only hint is "Keyerror: '.'" and I am not sure what that refers to. All my code files are in the repository. Maybe the problem is with some installed package file?
Are you running inside a docker ?
No, I am running inside a conda environment.
Any chance you can send the full log ? (edited)
What I sent is the full agent daemon log. If you are asking for the console output, then it is attached.
FileNotFoundError: [Errno 2] No such file or directory
Could it be the file you are trying to run is not in the repository ?
Are you running inside a docker ?
Any chance you can send the full log ?
As you suggested, I tried with a git repository. Got a completely different error. Attached is the log file. Any idea what's wrong?
Yes it does, but these files must be committed to begin with, basically think 'git diff' output is stored and then the agent applies it
Bingo (I guess). My code is local, with multiple files. I will try to connect it to a git repo and let you know how it worked.
Does the agent support uncommitted changes in multiple files? (on-top of a git commit).
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
Yes, I create the experiment by calling Task.init.
As you suggested, in the experiment tab I define the script path and the working directory.
Again, the task only created the environment and after that reported 'completed' without running my code.
Attaching the log of the last run, with the setting of the script and the folder.
Are you saying you had that odd script entry-point created by calling Task.init? (To clarify this is the problem)
Btw after you clone the experiment you can always manually edit both entry point and working dir, which based on what you said should be "script.py" and "folder"
As written above, I did the right click clone, then I did right click enqueue.
The experiment reported 'running', and immediately after preparing the environment it reported 'completed', without actually running my code. Please look at the beginning of this thread for output logs and more details.
As you said you just need to clone, righr click clone?
The only thing I need to do is clone my experiment. Can you help me make this happen?
Great if this is what you do how come you need to change the entry script in the ui?
I see such arguments (--script, --cwd) in the command 'clearml-task', but I am not using it. What I do is run my script ('python folder/script.py') and create a task inside it, using Task.init().
But the python command does not have such arguments (--script, --cwd). What am I missing?
Or, do you mean that those should be added to the Args list when cloning?
Oh I see, what you need is to pass '--script script.py' as entry-point and ' --cwd folder' as working dir
Who/What created the initial experiment ?
I created the initial experiment from command-line, with either "python folder/script.py" or "python -m folder.script".
Both end up with the experiment not running. I am attaching an agent daemon log where the initial experiment was called with "python folder/script.py".
Why isn't the entry point just the python script?
The entry point is folder.script and not just the script because I need the 'current' folder while running the script to be project root, so importing other packages in the project will work properly.
I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
Who/What created the initial experiment ?
I noticed that if I run the initial experiment by "python -m folder_name.script_name"
"-m module" as script entry is used to launch entry points like python modules (which is translated to "python -m script")
Why isn't the entry point just the python script?
The command line arguments are passed as arguments on the Args section of the Configuration section
AgitatedDove14 , I noticed that if I run the initial experiment by "python -m folder_name.script_name" then the script path contains the whole list of arguments as you observed.
On the other hand, if I run the initial experiment by "python folder_name/script_name.py", then the script path contains only 'script_name.py'.
In both cases I cannot clone the experiment, with the same results as I reported in my initial message.
TimelyMouse69 , yes, I ran successfully the first time before cloning it.
AgitatedDove14 , I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
If you wan to change the Args, go to the Args section in the Configuration tab, when the Task is in draft mode you can edit them there
Hi RotundSquirrel78
How did you end up with this command line?/home/sigalr/.clearml/venvs-builds/3.8/code/unet_sindiff_1_level_2_resblk --dataset humanml --device 0 --arch unet --channel_mult 1 --num_res_blocks 2 --use_scale_shift_norm --use_checkpoint --num_steps 300000
the arguments passed are odd (there should be none, they are passed inside the execution) and I suspect this is the issue
That's pretty weird. I don't see any clear indications something is wrong, it simply doesn't execute the rest it would seem. Did it successfully run the first time before cloning it?
Attached are the agent log and the task log
Could you upload the log so I can have a look?