
Reputation
Badges 1
59 × Eureka!I see such arguments (--script, --cwd) in the command 'clearml-task', but I am not using it. What I do is run my script ('python folder/script.py') and create a task inside it, using Task.init().
Could it be the file you are trying to run is not in the repository ?
It is unclear what file is missing. The only hint is "Keyerror: '.'" and I am not sure what that refers to. All my code files are in the repository. Maybe the problem is with some installed package file?
Are you running inside a docker ?
No, I am running inside a conda environment.
Any chance you can send the full log ? (edited)
What I sent is the full agent daemon log. If you are asking for the console...
But the python command does not have such arguments (--script, --cwd). What am I missing?
Or, do you mean that those should be added to the Args list when cloning?
No. I put a break point in my python script, and examined os.environ. The only environment variable with 'CLEARML' in its name is CLEARML_PROC_MASTER_ID, whose value is '16188:' (maybe it means something to you?)
Yes, I create the experiment by calling Task.init.
As you suggested, in the experiment tab I define the script path and the working directory.
Again, the task only created the environment and after that reported 'completed' without running my code.
Attaching the log of the last run, with the setting of the script and the folder.
AgitatedDove14 , thank you so much for your help.
I had a long video session today with the Israeli clearml engineers. There were plenty of things I had to do, and the two major ones were to define the environment variable CLEARML_AGENT_SKIP_PIP_VENV_INSTALL so it points to my conda environment python, and to call 'import clearml' from the top of my file (it was called from inside a method).
So now I can clone 🎉
TimelyMouse69 , yes, I ran successfully the first time before cloning it.
As you suggested, I tried with a git repository. Got a completely different error. Attached is the log file. Any idea what's wrong?
I was still having the issue and then I recalled an old solution, that worked again today. Here it is:
F12 --> Applications tab --> Storage --> Clear site data --> refresh clearml login screen
Who/What created the initial experiment ?
I created the initial experiment from command-line, with either "python folder/script.py" or "python -m folder.script".
Both end up with the experiment not running. I am attaching an agent daemon log where the initial experiment was called with "python folder/script.py".
Why isn't the entry point just the python script?
The entry point is folder.script and not just the script because I need the 'current' folder while running the script ...
Many errors :white_frowning_face: . Any idea what they mean?
Bingo (I guess). My code is local, with multiple files. I will try to connect it to a git repo and let you know how it worked.
Does the agent support uncommitted changes in multiple files? (on-top of a git commit).
Yes, I am using the trains server. We never took the time to update it to clearml.
The version (according to pip freeze) is 0.16.3.
I get an empty list for the 'XHR' filter.
Where do I see the agent print outs?
I am using an old version. It's a trains server of version 0.16.3.
I am not sure it matters for the following output, but anyway please note that the clearml dockers are down right now.
sigalr@momo : ~ $ curl -XGET http://localhost:9200/_cat/indices
yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 2F6APbQWSvajTZQ5JxXY1Q 1 1 59 0 26.2kb 26.2kb
yellow open events-plot-d1bd92a3b039400cbafc60a7a5b1e52b bZMKKCaKRXCys6VD_9oDDw 1 1 8556 0 4.1mb 4.1mb
yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 c85DhB...
As written above, I did the right click clone, then I did right click enqueue.
The experiment reported 'running', and immediately after preparing the environment it reported 'completed', without actually running my code. Please look at the beginning of this thread for output logs and more details.
Just to make sure, by running ES migration you mean running elastic_upgrade.py again. Correct?
It took ~36 hours two days ago.
The ES migration log is attached in the 1st message of this thread. Do you see any problems in it?
Is there any way to make sure that the ES migration results are not good?
Update: I ran the mongo migration script (clearml-server-1.2.0-migration.py) and now I can see my projects! 👏
Now there is a new problem: I don't see any of the logs: console, artefacts, scalars, plots.
Can you help?
Yes I've performed the ES migration. The data is in clearml/data/elastic_7.
AgitatedDove14 SuccessfulKoala55 , after I ran elastic_update.py (stage 5 as described above), I saw there was a new folder named data/mongo_4. Doesn't it mean mongodb was already migrated?
Is it ok to restore data/mongo from my backup, and leave all the other files that were created by elastic_upgrade.py (e.g., data/elastic_7) untouched?
What I mean, is: Do I need to run elastic_upgrade.py again, or just the mongo upgrade (clearml-server-1.2.0-migration.py)?
AppetizingMouse58 , SuccessfulKoala55 and AgitatedDove14 , after running the ES migration for the 2nd time the problem is solved 🎉 . Thank you all for your help! 🙏
The only thing I need to do is clone my experiment. Can you help me make this happen?
The clearml dockers are down right now because I started a new ES migration (elastic_upgrade.py). I started it before you contacted me and I don't want to break it now. So I cannot look at the console right now.
It will probably finish 30 hours from now. If the same problems repeat, we will continue this chat then.
In file docker-compose.yml I replaced all the strings /opt/clearml/data/elastic_7 into /home/orpat/clearml/data/elastic_7.