Reputation
Badges 1
611 × Eureka!No, it is just a pain to find files that have been deleted by a user, but are actually not deleted in the fileserver/s3 🙂
But no worries, nothing that is crucial.
I created this issue today, which can alleviate the pain temporarily: https://github.com/allegroai/clearml-server/issues/133
Now I get:
ollecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
...
Oh, interesting!
So pip version on per task basis makes sense ;D?
I am still trying to solve the add_requirements + importlib combo. If I use detect_with_freeze I can not use add_requirements and if I use automatic code analysis it will not find all packages because of importlib .
For now I come to the conclusion, that keeping a requirements.txt and making clearml parse the requirements from there should be the most robust solution. Unfortunately, there seems to be no way to do this with Task.init .
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
I just updated my server to 1.0 and now the services agent is stuck in restarting:
I see. But I just realized: Subsampling means you just show every nth datapoint, right? I still do not get why this leads to some 0.5 values when in my plot there should only be 0 and 1.
btw: why is agent.package_manager and agent attribute. Imo it does not make sense because conda can install pip packages, but pip cannot install conda packages which can lead to install failures, right?
Wait, nvm. I just tried it again and now it worked.
Thank you for clearing that up 🙂
So just tried again and still it does not work.
This is what is in .ssh on my clearml-agent-rw------- 1 tim tim 1,5K Apr 8 14:28 authorized_keys -rw-rw-r-- 1 tim tim 208 Apr 29 11:15 config -rw------- 1 tim tim 432 Apr 8 14:53 id_ed25519 -rw-r--r-- 1 tim tim 119 Apr 8 14:53 id_ed25519.pub -rw------- 1 tim tim 432 Apr 29 11:16 id_gitlab -rw-r--r-- 1 tim tim 119 Apr 29 11:25 id_gitlab.pub -rw-rw-r-- 1 tim tim 3,1K Apr 29 11:33 known_hosts
I have a related question: I read here that 4GB is a http limitation and ClearML will not chunk single files. I take from that, that ClearML did not want/there was no need to implement an own solution so far. But what about models that are larger than 4GB?
Well, I guess no hurdles vs. safety is inherently no solvable. I am all for hurdles, if it is clear how to overcome it. And in my opinion referring to clearml-init is something which makes sense from a developer and a user perspective.
Okay, I found something out: When I use docker image ubuntu:22.04 it does not spin up a service agent and aborts the task. When I used python:latest everything works fine!
One question: Does clearml resolve the CUDA Version from driver or conda?
Nvm. I think I understood. When the file has never been added to repository it is not tracked.
Hard to answer now. I just wiped everything and reinstalled. If I encounter this problem again, I will investigate further.
It is weird though. The task is submitted by the original user and then run on the agent. The task however is still registered by the original user, since it is created by the original user.
Makes more sense to just inherit the user from the task than from the agent?
To answer my own question: In the WebUI where one inputs the credentials, use https for the host instead of the auto-added http
I am going to try it again and send you the relevant part of the logs in a minute. Maybe I am interpreting something wrong.
Another example on what I would expect:
` ### start_carla.py
def get_task():
task = Task.init(project_name="examples", task_name="start-carla", task_type="application")
# experiment is not run here. The experiment is only run when this is executed as standalone or on a clearml-agent.
return task
def run_experiment(task):
...
This task can also be run as standalone or run by a clearml-agent
if name == "main":
task = get_task()
run_experiment(task)
run_pi...
Exactly. I don't want people to circumvent the queue 🙂
Thank you. Seems like someone implemented a type check Error: Dataset id=8d7355655830427f9243671c8cf0a6b0 is not of type Dataset :)