Reputation
Badges 1
606 × Eureka!Afaik, clearml-agent will use existing installed packages if they fit the requirements.txt. E.g. pytorch >= 1.7
will only install PyTorch if the environment does not already provide some version of PyTorch greater or equal to 1.7.
Okay, but are you logs still stored on MinIO with only using sdk.development.default_output_uri
?
@<1576381444509405184:profile|ManiacalLizard2> I ll check again 🙂 thanks
Thank you very much. I also saw a solution based on systemd and many more, so I am wondering what the best way is or does it even matter?
Is there a way to specify this on a per task basis? I am running clearml-agent in docker mode btw.
The default behavior mimics Python’s assert statement: validation is on by default, but is disabled if Python is run in optimized mode (via python -O). Validation may be expensive, so you may want to disable it once a model is working.
Then if the first agent is assigned a task of queue B if the next task is of type A it will have to wait, even though in theory there is capacity for it, if the first task had be executed on the second agent initially.
@<1576381444509405184:profile|ManiacalLizard2> Maybe you are using the enterprise version with the vault? I suppose the enterprise version is running differently, but I dont have experience with it.
For the open-source version, each clearml-agent is using it's own clearml.conf
Hey Martin, thank you for answering!
I see your point, however in my opinion this is really unexpected behavior. Sure, I can do some work to make it "safe", but shouldn't that be default. So throw an error without clearml.conf and expect CLEARML_USE_DEFAULT_SERVER=1
` .
So clearml 1.0.1 clearml-agent 1.0.0 and clearml-server from master
Okay, thank you anyways. I was just asking because I thought I had seen such a setting before. Must have been something different.
I colleague fixed my server and I can confirm, that the fix works!
What's the reason for the shift?
I am currently on the Open Source version, so no Vault. The environment variables are not meant to used on a per task basis right?
When the task is aborted I, the logs will show up, but the scalar logs will never appear. The scalar logs only appear when the task finishes.
Okay, no worries. I will check first. Thanks for helping!
Interesting. Will probably only matter for very small experiments or experiments, where validation is run very infrequently.
Ah, sore should have been more specific. I mean on the ClearML server.
I just tested with remote_execution and the problem seems to exist there, too. It is just that when the task switches from local to remote execution (i.e. exists the local script) the local scalars will appear, but no scalar of remote execution will show up. So also the iteration will not update. However, at least for remote execution I get live console output.
Maybe this opens up another question, which is more about how clearml-agent is supposed to be used. The "pure" way would be to make the docker image provide everything and clearml-agent should do not setup at all.
What I currently do instead is letting the docker image provide all system dependencies and let clearml-agent setup all the python dependencies. This allows me to reuse a docker image for more different experiments. However, then it would make sense to have as many configs as possib...
I am currently on the move, but it was something like upstream server not found in /etc/nginx/nginx.conf and if I remember correctly line 88
Maybe let s put it in a different way:
Pipeline
Preprocess Task Main Task Postprocess Task
My main task is my experiment, so my training code. When I ran the main task standalone, I just used Task.init
and set up the project name, task name, etc.
Now what I could do is push this task to the server, then just reference the task by its task-ID and run the pipeline. However, I do not want to push the main task to the server before running. Instead I want to push the whole pipeline, but st...
Perfect, works! I was looking for "host", didn't come to my mind to search for "worker". Any idea about getting the user that created the task?
It seems to work when I enable conda_freeze
.