with the CLI, on a conda env located in /data
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
JitteryCoyote63 it should just "freeze" after a while as it will constantly try to resend logs. Basically you should be fine 🙂
(If for some reason something crashed, please let me know so we can fix it)
AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
Maybe there is setting in docker to move the space used in a different location?
No that I know of...
I can simply increase the storage of the first disk, no problem with that
probably the easiest 🙂
But as you described
it looks like an edge case, so I don’t mind
🙂
JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.
Will it freeze/crash/break/stop the ongoing experiments?
JitteryCoyote63 how are you running the agent?
I was rather wondering why clearml was taking space while I configured it to use the /data volume. But as you described AgitatedDove14 it looks like an edge case, so I don’t mind 🙂
it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)