Reputation
Badges 1
611 × Eureka!Nono, I got to thank you for this awesome tool!
Very nice!
Maybe for the long-term future you could look into how to make better use of vertical space. Currently, there are 7 (5 in fullscreen mode)= different sections from content to top of the page. Maybe a compact mode would be nice or less space for content headlines.
Now I get:
ollecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
...
I installed my local conda environment from an environment.yml without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.
Thank you very much, didnt know about that 🙂
@<1523701087100473344:profile|SuccessfulKoala55> I just did the following (everything locally, not with clearml-agent)
- Set my credentials and S3 endpoint to A
- Run a task with Task.init() and save a debug sample to S3
- Abort the task
- Change my credentials and S3 endpoint to B
- Restart the taskThe result are lingering files in A that seem not to be associated with the task. I would expect ClearML to instead error the task or to track the lingering files somewhere, so they can ma...
I use fixed users!
Perfect, just what I always wanted. Looking forward to the MinIo version. Thank you:)
In my case I use the conda freeze option and do not even have CUDA installed on the agents.
So I just updated the env that clearml-agent created (and where pytorch cpu is installed) with my local environment.yml and now the correct version is installed, so most probably the `/tmp/conda_envaz1ne897.yml`` is the problem here
[2021-05-07 10:52:00,282] [9] [WARNING] [elasticsearch] POST ` [status:N/A request:60.058s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
@<1523701087100473344:profile|SuccessfulKoala55> Only when I delete on self-hosted.
@<1523712723274174464:profile|LazyFish41> WebApp: 1.10.0-357 • Server: 1.10.0-357 • API: 2.24
This has been happening with every version of clearml-server ever. Most probably there should be a queue in front of ES, so it does not process to many request at the same time?
Hi CostlyOstrich36 , thank you for answering so quick. I think that s not how it works because if this was true, one would have to always match local machine to servers. Afaik clearml finds the correct PyTorch Version, but I was not sure how (custom vs pip does it)
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.
Tested with clearml-agent 1.0.1rc4/1.2.2 and clearml 1.3.2
I am wondering cause when used in docker mode, the docker container may have a CUDA Version that is different from the host version. However, ClearML seems to use the host version instead of the docker container's version, which is a problem sometimes.
Nvm, I think its my mistake. I will investigate.
I used the wrong docker container. The docker container I used had version 11.4. Interestingly, the override from clearml.conf and CUDA_VERSION Env variable did not work there.
With the correct docker container everything works fine. Shame on me.
But yeah, I see the point of enterprise having this feature and basic not 🙂
Mhhm, then maybe it is not clear 😂 to me how clearml.Task is meant to be used. I thought of it as being a container for all the information regarding a single experiment that is reflected on the server-side and by this in the WebUI. Now I init() a Task and it will show in the WebUI. I thought after initialization I can still update the task to my liking, i.e. it being a documentation of my experiment.
AgitatedDove14 fyi I think this is the issue I have: https://stackoverflow.com/a/65526944/3038183