Reputation
Badges 1
19 × Eureka!Your problem is you didn't set the correct permissions for the new directory (as specified in the installation instructions). sudo chown -R 1000:1000 <trains-data-folder>
Did you make sure the clearml-agent was not installed in a venv?
It's enough that the task requirements will contain this latest version
UnevenDolphin73 are you using the latest clearml RC?
MagnificentMosquito84 what was the error message? Also, how did you try deleting the experiments?
Hi @<1535069219354316800:profile|PerplexedRaccoon19> , what you're missing is all the ElasticSearch data
Well, if the task was indeed running, it's strange that it was stopped since tasks have a thread that is in charge of pinging the server to make sure the server knows they're still running, so maybe there was some network issue?
Hey ReassuredTiger98 - ClearML is actually part of the http://Allegro.ai offering (includes also ClearML Enterprise), so everyone LOL 🙂
Why do you ask?
DeterminedOwl36 I think for now you can't change this name
ClearML Agent Services is a ClearML Agent running on the server machine with the --services
option
once the server is up, you cannot change it
DeterminedOwl36 can you please open a GitHub issue about it?
Hi JitteryCoyote63 ,
This message usually means the worker (agent) in question just sent a request to the ClearML Server (and received a response) after not communicating for more than 10min
no, step 1, 2 and 3 are unrelated (and still there)
Hi SubstantialElk6 ,
Are you using the template_yaml
argument?
Hi SubstantialBaldeagle49 , in order to upgrade to v1.2, you will need to run a migration, see https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration/
Hi JitteryCoyote63 ,
The clearml-server asked the clearml-agent to stop the task because it didn’t got anything for a long time?
Seems so - there's a "non-responsive tasks" watchdog on the server in charge of doing exactly that. I assume you're using a self-hosted server?
Hi @<1632551548112343040:profile|UpsetDuck81> , where is this pip install command coming from? This doesn't look like the ClearML Agent command format
If you can look around and maybe help with a PR that would be awesome 🙂
If you only want to move the server to another machine and not upgrade the server version, you should just copy the folder and restart the server there, but change nothing
OK, this indeed seems like v0.16 🙂
How did you set the agent for these GPUs?
In this scenario, I assume this would have to be pulled somehow from the secret manager on a ClearML remote run - how would ClearML know which user's data should be pulled from the secret manager? I assume your remote executions are using the agent's docker mode?
Hi AppetizingPelican85 , how exactly did you configure and run the agent?