Reputation
Badges 1
59 × Eureka!No. I put a break point in my python script, and examined os.environ. The only environment variable with 'CLEARML' in its name is CLEARML_PROC_MASTER_ID, whose value is '16188:' (maybe it means something to you?)
Could it be the file you are trying to run is not in the repository ?
It is unclear what file is missing. The only hint is "Keyerror: '.'" and I am not sure what that refers to. All my code files are in the repository. Maybe the problem is with some installed package file?
Are you running inside a docker ?
No, I am running inside a conda environment.
Any chance you can send the full log ? (edited)
What I sent is the full agent daemon log. If you are asking for the console...
TimelyMouse69 , yes, I ran successfully the first time before cloning it.
As you suggested, I tried with a git repository. Got a completely different error. Attached is the log file. Any idea what's wrong?
As written above, I did the right click clone, then I did right click enqueue.
The experiment reported 'running', and immediately after preparing the environment it reported 'completed', without actually running my code. Please look at the beginning of this thread for output logs and more details.
The clearml dockers are down right now because I started a new ES migration (elastic_upgrade.py). I started it before you contacted me and I don't want to break it now. So I cannot look at the console right now.
It will probably finish 30 hours from now. If the same problems repeat, we will continue this chat then.
The only thing I need to do is clone my experiment. Can you help me make this happen?
But the python command does not have such arguments (--script, --cwd). What am I missing?
Or, do you mean that those should be added to the Args list when cloning?
AgitatedDove14 , I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
I clicked Fetch/XHR and got the following (after another reboot)
I did not upgrade anything and did not do docker pull.
I am having a temporary network issue . Will send the output of the โ docker inspectโ as soon as I can reconnect to my server.
I will try it and keep you posted. Thanks!
Many errors :white_frowning_face: . Any idea what they mean?
AgitatedDove14 SuccessfulKoala55 , after I ran elastic_update.py (stage 5 as described above), I saw there was a new folder named data/mongo_4. Doesn't it mean mongodb was already migrated?
Here is the developer tool Network screen capture after refreshing the page and trying to login.
my original trains server version was 0.14 if I remember correctly. Anywhere I can check it after the upgrade has been done?
My new clearml server is 1.5. I get that from http://localhost:8080/version.json but if there is somewhere else I should look, let me know.
Yes I've performed the ES migration. The data is in clearml/data/elastic_7.
The sequence is unclear then:
I followed the instructions in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_es7_migration/ .
Stage 5 ("python elastic_upgrade.py") ended successfully.
Then I skipped "Upgrading to ClearML Server v.1.2. or Newer" and went straight to "Completing the Installation".
Did I do wrong? What should I do to fix it?
Is there any log that maybe details the problem?
Bingo (I guess). My code is local, with multiple files. I will try to connect it to a git repo and let you know how it worked.
Does the agent support uncommitted changes in multiple files? (on-top of a git commit).
I see such arguments (--script, --cwd) in the command 'clearml-task', but I am not using it. What I do is run my script ('python folder/script.py') and create a task inside it, using Task.init().
Just to make sure, by running ES migration you mean running elastic_upgrade.py again. Correct?
It took ~36 hours two days ago.
The ES migration log is attached in the 1st message of this thread. Do you see any problems in it?
Is there any way to make sure that the ES migration results are not good?
Is it ok to restore data/mongo from my backup, and leave all the other files that were created by elastic_upgrade.py (e.g., data/elastic_7) untouched?
What I mean, is: Do I need to run elastic_upgrade.py again, or just the mongo upgrade (clearml-server-1.2.0-migration.py)?
In file docker-compose.yml I replaced all the strings /opt/clearml/data/elastic_7 into /home/orpat/clearml/data/elastic_7.
AgitatedDove14 , thank you so much for your help.
I had a long video session today with the Israeli clearml engineers. There were plenty of things I had to do, and the two major ones were to define the environment variable CLEARML_AGENT_SKIP_PIP_VENV_INSTALL so it points to my conda environment python, and to call 'import clearml' from the top of my file (it was called from inside a method).
So now I can clone ๐