Reputation
Badges 1
611 × Eureka!I am using https://hub.docker.com/layers/nvidia/cuda/11.8.0-base-ubuntu22.04/images/sha256-88b85c6edd089acdf0cb7f3be020a1e812b009bafaf92c1715ab6677bd997ef1?context=explore
which has python 3.10.6 if I remember correctly.
Long story short, the Task requirements are async, so if one puts it after creating the object (at least in theory), it might be too late.
AgitatedDove14 Is there no await/synchronize method to wait for task update?
I only added# Python 3.8.2 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] --extra-index-url clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpuand I used a amd64/ubuntu:20.04 docker image with python3.8 . Same error. If it is not too much to ask, could you try to run it with this docker image?
I have no idea myself, but what the serverfault thread says about man-in-the-middle makes sense. However this also prohibits an automatic solution except for a shared known_hosts file I guess.
Sounds like a good hack, but not like a good solution π But thank you anyways! π
Could you elaborate on that:
"So the agent failed to actually restore it from the git (files that are not added are not considered part of the git diff, this is usually git behavior)."
However, because of the import carla it is added to the task requirements and clearml-agent tries to install it, although it is meant to be included at runtime.
The default behavior mimics Pythonβs assert statement: validation is on by default, but is disabled if Python is run in optimized mode (via python -O). Validation may be expensive, so you may want to disable it once a model is working.
In the beginning my config file was not empty π
Thank you very much for the quick answer. Still so confusing to me that so many things are configured client side π
Ok. I just wanted to make sure I have configured my agent properly. Just want to make sure I have to set it on all agents.
I am still trying to solve the add_requirements + importlib combo. If I use detect_with_freeze I can not use add_requirements and if I use automatic code analysis it will not find all packages because of importlib .
For now I come to the conclusion, that keeping a requirements.txt and making clearml parse the requirements from there should be the most robust solution. Unfortunately, there seems to be no way to do this with Task.init .
So actually deleting from client (e.g. an dataset with clearml-data) works.
By host you mean the machine on which the agent is running? How does clearml-agent find the cuda_version?
That I understand. But I think (old) pip versions will sometimes not resolve a package. Probably not the case the other way around.
Okay, no worries. I will check first. Thanks for helping!
Could be clean log after restart. Unfortunately, I restarted the server right away π I gonna post if it happens again with the appropriate logs.
Depends on how you start the task afaik. I think clearml-task uses requirements.txt by default, but otherwise clearml will parse your files dependencies or if you changed in clearml.conf it will use your conda/pip environment to generate the requirements.
Any idea why deletion of artifacts on my second fileserver does not work?
fileserver_datasets: networks: - backend - frontend command: - fileserver container_name: clearml-fileserver-datasets image: allegroai/clearml:latest restart: unless-stopped volumes: - /opt/clearml/logs:/var/log/clearml - /opt/clearml/data/fileserver-datasets:/mnt/fileserver - /opt/clearml/config:/opt/clearml/config ports: - "8082:8081"
ClearML successfu...
When I select many experiments it will only delete some and show an error message, that some could not be deleted. But if I only select a few, everything works fine.
Currently, my solution is to create an "agent-git" account and users can give read-access to this account which the clearml-agent then uses to clone. However, I find access-tokens to be a better solution. Unfortunately, clearml-agent removes the token from the git url
btw: Could you check whether agent.package_manager.system_site_packages is true or false in your config and in the summary that the agent gives before execution?
I start my agent in --foreground mode for debugging and it clearly show false , but in the summary that the agent gives before the task is executed, it shows true .
I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.