Reputation
Badges 1
606 × Eureka!I just wanna avoid that ClearML leaves files lingering around. Btw: a better default behavior in my opinion would be to delete tasks only after files have been deleted. And only with the force option to delete the task anyways!
Let me try it another time. Maybe something else went wrong.
Or better some cache option. Otherweise the cron job is what I will use 🙂 Thanks again
Yea, but before in my original setup the config file was filled. I just added some lines to the config and now the error is back.
Wait, nvm. I just tried it again and now it worked.
For example in our case we do reinforcement learning and the we would call a script like this: python run_openai_gym.py some_
http://package.my _agent
.
I have my development machine where I develop for multiple projects. I want to configure clearml differently based on the project. Similar to .vscode
, .git
or .idea
at the project level.
Thank you for the quick reply. Maybe anyone knows whether there is an option to let docker delete images after container exit?
Mhhm, good hint! Unfortunetly I can see nowhere in logs when the server creates a delete request
So just tried again and still it does not work.
This is what is in .ssh on my clearml-agent
-rw------- 1 tim tim 1,5K Apr 8 14:28 authorized_keys -rw-rw-r-- 1 tim tim 208 Apr 29 11:15 config -rw------- 1 tim tim 432 Apr 8 14:53 id_ed25519 -rw-r--r-- 1 tim tim 119 Apr 8 14:53 id_ed25519.pub -rw------- 1 tim tim 432 Apr 29 11:16 id_gitlab -rw-r--r-- 1 tim tim 119 Apr 29 11:25 id_gitlab.pub -rw-rw-r-- 1 tim tim 3,1K Apr 29 11:33 known_hosts
I am wondering cause when used in docker mode, the docker container may have a CUDA Version that is different from the host version. However, ClearML seems to use the host version instead of the docker container's version, which is a problem sometimes.
Thank you. Yes we need to wait for carla to spin up.
btw: Could you check whether agent.package_manager.system_site_packages
is true
or false
in your config and in the summary that the agent gives before execution?
I start my agent in --foreground
mode for debugging and it clearly show false
, but in the summary that the agent gives before the task is executed, it shows true
.
I am still not getting why it is a problem to just update the requirements at any time... 😕
Yea, it was finished after 20 hours. Since the artifact started uploading when the experiment finishes otherwise, there is no reporting for the the time where it uploaded. I will debug it and report what I find out
Yea, correct! No problem. Uploading such large artifacts as I am doing seems to be an absolute edge case 🙂
Yea, and the script ends with clearml.Task - INFO - Waiting to finish uploads
I see a python 3 fileserver.py
running on a single thread with 100% load.
481.2130692792125 seconds
Done
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.
Agent runs in docker mode. I ran the agent on the same machine as the server this time.
Artifact Size: 74.62 MB
AgitatedDove14 Yea, I also had this problem: https://github.com/allegroai/clearml-server/issues/87 I have Samsung 970 Pro 2TB on all machines, but maybe something is missconfigured like SuccessfulKoala55 suggested. I will take a look. Thank you for now!
Hi TimelyMouse69
Thank you for answering, but I do not think these methods do allow me to modify anything the is set in clearml.conf. Rather they just do logging.
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
It is only a single agent that is sending a single artifact. server-->agent is fast, but agent-->server is slow.
But it is not related to network speed, rather to clearml. I simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.