Reputation
Badges 1
606 × Eureka!I got some warnings about broken packages. I cleaned the conda cache with conda clean -a
` and now it installed fine!
AgitatedDove14 SuccessfulKoala55 Could you briefly explain whether clearml supports no-copy add for datasets?
Yea, the real problem is that I have very large datasets in network storage. I am looking for a way to add the datasets on the networks storage as clearml-dataset.
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
Yea, tensorboardX is using moviepy.
pytorch.tensorboard is the same as tensorboardx https://github.com/pytorch/pytorch/blob/6d45d7a6c331ddb856ac34a76bcd3613aa05185b/torch/utils/tensorboard/summary.py#L461
Sure, no problem!
I installed as told on pytorch.org : pip3 install --pre torch torchvision torchaudio --index-url
None
And how to specify this fileserver as output_uri
?
I guess the supported storage mediums (e.g. S3, ceph, etc...) dont have this issue, right?
Thanks for researching this issue. If you have time, you can create the issue since you are way more knowledgeable, but I can also open it if you do not have time 🙂
Yea, when the server handles the deletes everythings fine and imo, that is how it should always have been.
I don't think it is a viable option. You are looking at the best case, but I think you should expect the worst from the users 🙂 Also I would rather know there is a problem and have some clutter than to hide it and never be able to fix it because I cannot identify which artifacts are still in use without spending a lot of time comparing artifact IDs.
Can you give me an example how I can add a second fileserver?
I created an github issue because the problem with the slow deletion still exists. https://github.com/allegroai/clearml/issues/586#issue-1142916619
And how do I specify this in the output_uri
? The default file server is specified by passing True
. How would I specify to use the second?
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.
Now for some reason everything is gone .. 😕
[2021-05-07 10:53:00,566] [9] [WARNING] [elasticsearch] POST
` [status:N/A request:60.061s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
In the WebUI it just shows that an error happened after the loading bar has been running for a while.
I tried to delete the same tasks again and this time, it instantly confirmed deletion and the tasks are gone.
Seems to happen only while the cleanup_service is running!
But the problems seem to be reoccuring
No idea what's happening there.
I have no idea whether it is a user error or because of the clearml-server update...
I got the error again. Seems to happen only when I try to delete "large" experiments.
No problem in my case at least.
[2021-05-07 10:52:00,282] [9] [WARNING] [elasticsearch] POST
` [status:N/A request:60.058s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
When I select many experiments it will only delete some and show an error message, that some could not be deleted. But if I only select a few, everything works fine.