Reputation
Badges 1
59 × Eureka!Thanks, I will give it a try
Who/What created the initial experiment ?
I created the initial experiment from command-line, with either "python folder/script.py" or "python -m folder.script".
Both end up with the experiment not running. I am attaching an agent daemon log where the initial experiment was called with "python folder/script.py".
Why isn't the entry point just the python script?
The entry point is folder.script and not just the script because I need the 'current' folder while running the script ...
As written above, I did the right click clone, then I did right click enqueue.
The experiment reported 'running', and immediately after preparing the environment it reported 'completed', without actually running my code. Please look at the beginning of this thread for output logs and more details.
AgitatedDove14 , I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
The only thing I need to do is clone my experiment. Can you help me make this happen?
AgitatedDove14 , I noticed that if I run the initial experiment by "python -m folder_name.script_name" then the script path contains the whole list of arguments as you observed.
On the other hand, if I run the initial experiment by "python folder_name/script_name.py", then the script path contains only 'script_name.py'.
In both cases I cannot clone the experiment, with the same results as I reported in my initial message.
Yes, I create the experiment by calling Task.init.
As you suggested, in the experiment tab I define the script path and the working directory.
Again, the task only created the environment and after that reported 'completed' without running my code.
Attaching the log of the last run, with the setting of the script and the folder.
Attached are the agent log and the task log
Bingo (I guess). My code is local, with multiple files. I will try to connect it to a git repo and let you know how it worked.
Does the agent support uncommitted changes in multiple files? (on-top of a git commit).
But the python command does not have such arguments (--script, --cwd). What am I missing?
Or, do you mean that those should be added to the Args list when cloning?
Could it be the file you are trying to run is not in the repository ?
It is unclear what file is missing. The only hint is "Keyerror: '.'" and I am not sure what that refers to. All my code files are in the repository. Maybe the problem is with some installed package file?
Are you running inside a docker ?
No, I am running inside a conda environment.
Any chance you can send the full log ? (edited)
What I sent is the full agent daemon log. If you are asking for the console...
TimelyMouse69 , yes, I ran successfully the first time before cloning it.
Just to make sure, by running ES migration you mean running elastic_upgrade.py again. Correct?
It took ~36 hours two days ago.
The ES migration log is attached in the 1st message of this thread. Do you see any problems in it?
Is there any way to make sure that the ES migration results are not good?
Is there any log that maybe details the problem?
Update: I ran the mongo migration script (clearml-server-1.2.0-migration.py) and now I can see my projects! π
Now there is a new problem: I don't see any of the logs: console, artefacts, scalars, plots.
Can you help?
Yes I've performed the ES migration. The data is in clearml/data/elastic_7.
AppetizingMouse58 , SuccessfulKoala55 and AgitatedDove14 , after running the ES migration for the 2nd time the problem is solved π . Thank you all for your help! π
I see such arguments (--script, --cwd) in the command 'clearml-task', but I am not using it. What I do is run my script ('python folder/script.py') and create a task inside it, using Task.init().
The sequence is unclear then:
I followed the instructions in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_es7_migration/ .
Stage 5 ("python elastic_upgrade.py") ended successfully.
Then I skipped "Upgrading to ClearML Server v.1.2. or Newer" and went straight to "Completing the Installation".
Did I do wrong? What should I do to fix it?
my original trains server version was 0.14 if I remember correctly. Anywhere I can check it after the upgrade has been done?
My new clearml server is 1.5. I get that from http://localhost:8080/version.json but if there is somewhere else I should look, let me know.
Is it ok to restore data/mongo from my backup, and leave all the other files that were created by elastic_upgrade.py (e.g., data/elastic_7) untouched?
What I mean, is: Do I need to run elastic_upgrade.py again, or just the mongo upgrade (clearml-server-1.2.0-migration.py)?
I will try it and keep you posted. Thanks!
I am using a self hosted server.
I suspect that maybe the server gets stuck when I compare a large number of experiments (~10). Can that be possible?
It just got ok by itself after an hour or two. So I will search for those 'network' loggings and UI console loggings the next time it happens.
I was still having the issue and then I recalled an old solution, that worked again today. Here it is:
F12 --> Applications tab --> Storage --> Clear site data --> refresh clearml login screen
I did not upgrade anything and did not do docker pull.
I am having a temporary network issue . Will send the output of the β docker inspectβ as soon as I can reconnect to my server.
I can enter my user name but even the button underneath it is blank (see below). Once clicking it, the whole screen is blank as in the 1st image that I sent.