Reputation
Badges 1
606 × Eureka!Give me 5min and I send the full log
So the environment variables are not set by the clearml-agent, but by clearml itself
And clearml-agent should pull these datasets from network storage...
Unfortunately, not. Quick question: Is there caching happening somewhere besides .clearml
? Does the boto3 driver create cache?
I restarted it after I got the errors, because as everyone knows, turning it off and on usually works 😄
Maybe deletion happens "async" and is not reflected in parts of clearml? It seems that if I try to delete often enough at some point it is successfull
How can I see that?
I have venv_update.enabled: true
and detect_with_conda_freeze: true
I got some warnings about broken packages. I cleaned the conda cache with conda clean -a
` and now it installed fine!
AgitatedDove14 SuccessfulKoala55 Could you briefly explain whether clearml supports no-copy add for datasets?
Yea, the real problem is that I have very large datasets in network storage. I am looking for a way to add the datasets on the networks storage as clearml-dataset.
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
I guess this is the current way to do it: https://github.com/tensorflow/tensorboard/issues/39#issuecomment-568917607 so I would say: Yes, it supports gif.
Yea, tensorboardX is using moviepy.
pytorch.tensorboard is the same as tensorboardx https://github.com/pytorch/pytorch/blob/6d45d7a6c331ddb856ac34a76bcd3613aa05185b/torch/utils/tensorboard/summary.py#L461
Sure, no problem!
I installed as told on pytorch.org : pip3 install --pre torch torchvision torchaudio --index-url
None
And how to specify this fileserver as output_uri
?
I guess the supported storage mediums (e.g. S3, ceph, etc...) dont have this issue, right?
Thanks for researching this issue. If you have time, you can create the issue since you are way more knowledgeable, but I can also open it if you do not have time 🙂
Yea, when the server handles the deletes everythings fine and imo, that is how it should always have been.
I don't think it is a viable option. You are looking at the best case, but I think you should expect the worst from the users 🙂 Also I would rather know there is a problem and have some clutter than to hide it and never be able to fix it because I cannot identify which artifacts are still in use without spending a lot of time comparing artifact IDs.
Can you give me an example how I can add a second fileserver?
I created an github issue because the problem with the slow deletion still exists. https://github.com/allegroai/clearml/issues/586#issue-1142916619
And how do I specify this in the output_uri
? The default file server is specified by passing True
. How would I specify to use the second?