Reputation
Badges 1
606 × Eureka!Btw: It is weird that the fileservers are directly exposed, so no authentication through the webserver is needed. Is this something that is different in the paid version or why is it like that in the open-source version?
Tried to install cudatoolkit==11.1 manually in this environemnt and got:
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package xz conflicts for:
python=3....
Is this really working for you guys? I have no clue what's wrong. Seems so unlikely that my code works with artifacts, datasets, but not logging...
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client
Now trying changing the default file server.
Yea I know, I reported this 🙂 .
@<1576381444509405184:profile|ManiacalLizard2> Yea, that makes sense. However, my problem is that I do not want to set it on the remote clearml-agent, since every use may have a different storage. E.g. one user pushes to Azure, while another one pushes to S3
SuccessfulKoala55 So what happens is, that always when/after the cleanup_service runs, clearml will throw these kind of errors
It could be that either the clearml-server has bad behaviour while clean up is ongoing or even after.
Here is a part of the cleanup service log. Unfortunately, I cannot even download the full log currently, because the clearml-server will just throw errors for everything.
How can I see that?
I have venv_update.enabled: true
and detect_with_conda_freeze: true
Thanks a lot, now I think I understand.
Debug samples can only be controlled via api.file_server (or programatically)
Could you guide me how to approach this programmatically? Can I implement my own storage adapter for debug samples with ClearML interfaces or am I on my own?
@<1576381444509405184:profile|ManiacalLizard2> Thank you, but afaik this only works locally and not if you run your task on a clearml-agent!
@<1523701205467926528:profile|AgitatedDove14> Thank you very much for your guidance. Setting these manually works for me!
AgitatedDove14 Thank you, that explains it.
So the service is something that I can right and that intercepts the addition of a task to a queue?
I think I still don't get how clearml is supposed to work/be used. Why wouldn't the following work currently?
Example:
` task = Task.init(...)
if not running_remotely:
task_dict = task.export_task()
requirements = task_dict["script"]["requirements"]["pip"].splitlines()
requirement_torch = [r for r in requirements if r.startswith("torch==")]
requirements.remove(requirement_torch[0])
requirements.append("torch >= 1.8.1")
task_dict["script"]["requirements"]["pip"] = "\n"....
To update the task.requirements before it actually creates it (the requirements are created in a background thread)
Why can't it be updated after creation?
But you can manually add them with Task.add_requirements, no?
In my opinion an ugly solution. I would have to keep track of which requirements are missing. Then I would rather just add all requirements manually.
I am pretty sure there is a flag in the clearml.conf where you can specify which python binary to use.
I will debug this myself a little more.
When is the base_task_factory called? At runtime or definition time?
Nono, I got to thank you for this awesome tool!
Oh you are right. I did not think this through... To implement this properly it gets to enterprisy for me, so I ll just leave it for now :D
Okay, great! I just want to run the cleanup services, however I am running into ssh issues so I wanted to restart it to try to debug.