Reputation
Badges 1
611 × Eureka!btw: I am pretty sure this used to work, but then stopped work some time ago.
Then I could also do this:# My custom very special use case task = Task() task = task.load_statedict(await Task.load_or_create(task_name)) await task.synchronize() await run_code_analysis() task.add_requirement("myreq") await task.synchronize()
I don't know actually. But Pytorch documentation says it can make a difference: https://pytorch.org/docs/stable/distributions.html#torch.distributions.distribution.Distribution.set_default_validate_args
Yea, but before in my original setup the config file was filled. I just added some lines to the config and now the error is back.
Maybe there is something wrong with my setup. Conda confuses me sometimes.
Nono, I got to thank you for this awesome tool!
I got the error again. Seems to happen only when I try to delete "large" experiments.
Very nice!
Maybe for the long-term future you could look into how to make better use of vertical space. Currently, there are 7 (5 in fullscreen mode)= different sections from content to top of the page. Maybe a compact mode would be nice or less space for content headlines.
Now I get:
ollecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
...
I installed my local conda environment from an environment.yml without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.
Thank you very much, didnt know about that 🙂
@<1523701087100473344:profile|SuccessfulKoala55> I just did the following (everything locally, not with clearml-agent)
- Set my credentials and S3 endpoint to A
- Run a task with Task.init() and save a debug sample to S3
- Abort the task
- Change my credentials and S3 endpoint to B
- Restart the taskThe result are lingering files in A that seem not to be associated with the task. I would expect ClearML to instead error the task or to track the lingering files somewhere, so they can ma...
I think in the paid version there is this configuration vault, so that the user can pass their own credentials securely to the agent.
I use fixed users!
Perfect, just what I always wanted. Looking forward to the MinIo version. Thank you:)
In my case I use the conda freeze option and do not even have CUDA installed on the agents.
So I just updated the env that clearml-agent created (and where pytorch cpu is installed) with my local environment.yml and now the correct version is installed, so most probably the `/tmp/conda_envaz1ne897.yml`` is the problem here
[2021-05-07 10:52:00,282] [9] [WARNING] [elasticsearch] POST ` [status:N/A request:60.058s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
@<1523701087100473344:profile|SuccessfulKoala55> Only when I delete on self-hosted.
@<1523712723274174464:profile|LazyFish41> WebApp: 1.10.0-357 • Server: 1.10.0-357 • API: 2.24
This has been happening with every version of clearml-server ever. Most probably there should be a queue in front of ES, so it does not process to many request at the same time?
Thanks for researching this issue. If you have time, you can create the issue since you are way more knowledgeable, but I can also open it if you do not have time 🙂
Hi CostlyOstrich36 , thank you for answering so quick. I think that s not how it works because if this was true, one would have to always match local machine to servers. Afaik clearml finds the correct PyTorch Version, but I was not sure how (custom vs pip does it)
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.