Reputation
Badges 1
611 × Eureka!` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
AgitatedDove14 Thank you, that explains it.
Depends on how you start the task afaik. I think clearml-task uses requirements.txt by default, but otherwise clearml will parse your files dependencies or if you changed in clearml.conf it will use your conda/pip environment to generate the requirements.
Can You tell me which python version is running on the agent/docker and which docker image?
Okay, thank you anyways. I was just asking because I thought I had seen such a setting before. Must have been something different.
Perfect! That sounds like a good solution for me.
No idea what's happening there.
Perfect and thank you for your efforts! :)
Based on https://github.com/lanpa/tensorboardX/blob/34d1616c035faaa0f3f7c9d19cb8bb4425f19939/tensorboardX/summary.py#L355 I would guess that it is already encoded before added to the tensorboard summary.
One more thing: The cuda_version that clearml finds automatically is wrong.
` =============
== PyTorch ==
NVIDIA Release 22.03 (build 33569136)
PyTorch Version 1.12.0a0+2c916ef ... Looking in indexes: ,
Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages (22.0.4)
2022-04-07 16:40:57
Looking in indexes: ,
Requirement already satisfied: Cython in /opt/conda/lib/python3.8/site-packages (0.29.28)
Looking in indexes: ,
Requirement already satisfied: numpy==1.22.3 in /opt/conda/...
I see. Thanks for explaining!
My agent shows the same as before:
` ...
Environment setup completed successfully
Starting Task Execution:
DONE: Running task 'aff7c6605b7243d38968f95b4351b127', exit status 0 `
When experimenting we use a entrypoint script which we pass the specific experiment to.
Interesting: This command failes (with an error similar to the one I posted above) in conda version 4.7.12 but runs just fine in version 4.9.2: conda create --name test-pytorch python=3.8 cudatoolkit=11.1 -c conda-forge
Btw: It is weird that the fileservers are directly exposed, so no authentication through the webserver is needed. Is this something that is different in the paid version or why is it like that in the open-source version?
No problem. Sounds like a good solution, no need to implement something that has already been implemented somewhere else 🙂
Makes sense, but this means that we are not able to tell clearml-agent where to save on a per-task basis? I see the output_destination set correctly in clearml web interface, but as you say, clearml-agent always uses its api.fileserver ?
I will try again tomorrow. It s getting late! Thank you for helping so far!
Actually, my current approach looks like this:
carla-server-task : Launch carla server instance on a random port, set the port as param and then block the task/process, so I can kill carla when this task is aborted. This task keeps running the whole time.
start-carla-task : Launch a carla-server-task and wait for the port parameter to be set. Set the launched carla-server-task task-id and the port as param. Set task completed.
main-task : Run experiment when all start-carla-task are...
Thanks a lot. To summarize: To me clearml is a framework, but I would rather have it be a library.
Other than that I am very happy with clearml and it is probably my favorite machine learning related package of the last two years! 🙂 And thanks for taking so much time to talk to me!