Reputation
Badges 1
606 × Eureka!@<1523701205467926528:profile|AgitatedDove14> Thank you very much for your guidance. Setting these manually works for me!
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
Mhhm, good hint! Unfortunetly I can see nowhere in logs when the server creates a delete request
I usually also experience no problems with restarting the clearml-server. It seems like it has to do with the OOM (or whatever issue I have).
It didn't revert. Just one of my colleagues that I wanted to introduce to clearml put his clearml.conf in the wrong directory and pushed his experiments to the public server.
So I do not blame clearml for this mistake, but generally designing the system to be fail-safe is better than hope that everything is used like it has been designed π
Wouldn't it be enough to just require a call to clearml-init
and throw an error when running without clearml.conf
which tells the user to run clearml-init first?
Okay, I found something out: When I use docker image ubuntu:22.04
it does not spin up a service agent and aborts the task. When I used python:latest
everything works fine!
@<1576381444509405184:profile|ManiacalLizard2> Yea, that makes sense. However, my problem is that I do not want to set it on the remote clearml-agent, since every use may have a different storage. E.g. one user pushes to Azure, while another one pushes to S3
Artifact Size: 74.62 MB
Okay, great! I just want to run the cleanup services, however I am running into ssh issues so I wanted to restart it to try to debug.
Thank you very much. I am going to try that.
Is there a clearml.conf for this agent somewhere?
Afaik, clearml-agent will use existing installed packages if they fit the requirements.txt. E.g. pytorch >= 1.7
will only install PyTorch if the environment does not already provide some version of PyTorch greater or equal to 1.7.
Okay, but are you logs still stored on MinIO with only using sdk.development.default_output_uri
?
It is only a single agent that is sending a single artifact. server-->agent is fast, but agent-->server is slow.
The agent and server have similar hardware also. So I would expect same read/write speed.
Agent runs in docker mode. I ran the agent on the same machine as the server this time.
I see a python 3 fileserver.py
running on a single thread with 100% load.
Yea, it was finished after 20 hours. Since the artifact started uploading when the experiment finishes otherwise, there is no reporting for the the time where it uploaded. I will debug it and report what I find out
@<1576381444509405184:profile|ManiacalLizard2> I ll check again π thanks
Thank you very much. I also saw a solution based on systemd and many more, so I am wondering what the best way is or does it even matter?
Is there a way to specify this on a per task basis? I am running clearml-agent in docker mode btw.
481.2130692792125 seconds
Done
The default behavior mimics Pythonβs assert statement: validation is on by default, but is disabled if Python is run in optimized mode (via python -O). Validation may be expensive, so you may want to disable it once a model is working.
Then if the first agent is assigned a task of queue B if the next task is of type A it will have to wait, even though in theory there is capacity for it, if the first task had be executed on the second agent initially.