it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
JitteryCoyote63 it should just "freeze" after a while as it will constantly try to resend logs. Basically you should be fine 🙂
(If for some reason something crashed, please let me know so we can fix it)
Will it freeze/crash/break/stop the ongoing experiments?
AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)
Maybe there is setting in docker to move the space used in a different location?
No that I know of...
I can simply increase the storage of the first disk, no problem with that
probably the easiest 🙂
But as you described
it looks like an edge case, so I don’t mind
🙂
I was rather wondering why clearml was taking space while I configured it to use the /data volume. But as you described AgitatedDove14 it looks like an edge case, so I don’t mind 🙂
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)
JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
with the CLI, on a conda env located in /data
JitteryCoyote63 how are you running the agent?