
Reputation
Badges 1
611 × Eureka!Here it is
Makes sense, but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
Okay, thanks for explaining!
Hi SuccessfulKoala55
I meant that in the WebUI deletion should only be allowed for artifacts for which deletion actually works.
For example I now have a lot of lingering artifacts that exist on the fileservers, but not on the clearml-api-server (I think).
Another example: I delete a task via WebUI. ClearML-server tries to delete the task and the artifacts belonging to the task. However, it will show that the task has been successfully deleted but some artifacts have not. Now there is no way...
One thing I want to add: Maybe you disabling deletion of artifacts if file-server deletion fails. Doesn't make sense that we cannot track existing files if something goes wrong.
And how to specify this fileserver as output_uri
?
- solves it. I did not know this is possible.
Is there a way to specify this on a per task basis? I am running clearml-agent in docker mode btw.
I installed my local conda environment from an environment.yml
without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.
Thank you very much. I also saw a solution based on systemd and many more, so I am wondering what the best way is or does it even matter?
I was wrong: I think it uses the agent.cuda_version
, not the local env cuda version.
# Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
aiostream==0.4.2
attrs==20.3.0
clearml==0.17.4
dm-control==0.0.355168290
dm-env==1.4
furl==2.1.0
future==0.18.2
glfw==2.1.0
gym==0.18.0
humanfriendly==9.1
imageio-ffmpeg==0.4.3
jsonschema==3.2.0
labmaze==1.0.3
lxml==4.6.2
moviepy==1.0.3
orderedmultidict==1.0.1
pathlib2==2.3.5
pillow==7.2.0
proglog==0.1.9
psutil==5.8.0
pybullet==3.0.9
pygame==2.0.1
pyglet==1.5.0
pyjwt==2.0.1
pyrsistent==0.17.3
requests-file==1.5.1
tensorboard...
Thank you very much!
Hi @<1523701087100473344:profile|SuccessfulKoala55> Thank you very much.
Is there some way to verify the server uses the correct configuration files? (E.g. see it in the logs/web ui). I Just tried it does not work.
At least I can see the async_delete service complains about a missing secret, so I can start debugging there. I am using the same config as for my agents, but somehow for async_delete it does not work...
Yea, that I knew 😄 But somehow I didn't think about the clearml.conf
So it seems to be definitely a problem with docker and not with clearml. However, I do not get, why it works for you but on none of my machine (all Ubuntu 20.04 with docker 20.10)
Thank you very much. I tested it on a different machine now and it works like intended. So there must be something misconfigured with this one machine.
Okay, no worries. I will check first. Thanks for helping!
Yea, something like this seems to be the best solution.
I don't know actually. But Pytorch documentation says it can make a difference: https://pytorch.org/docs/stable/distributions.html#torch.distributions.distribution.Distribution.set_default_validate_args
One question: Does clearml resolve the CUDA Version from driver or conda?