
Reputation
Badges 1
981 × Eureka!it actually looks like I donโt need such a high number of files opened at the same time
So actually I donโt need to play with this limit, I am OK with the default for now
mmmh it fails, but if I connect to the instance and execute ulimit -n
, I do see65535
while the tasks I send to this agent fail with:OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'
and from the task itself, I run:import subprocess print(subprocess.check_output("ulimit -n", shell=True))
Which gives me in the logs of the task:b'1024'
So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work
because at some point it introduces too much overhead I guess
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
with the CLI, on a conda env located in /data
AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
Will it freeze/crash/break/stop the ongoing experiments?
I will try addingsudo sh -c "echo '\n* soft nofile 65535\n* hard nofile 65535' >> /etc/security/limits.conf"
to the extra_vm_bash_script
, maybe thatโs enough actually
AgitatedDove14 This looks awesome! Unfortunately this would require a lot of changes in my current code, for that project I found a workaround ๐ But I will surely use it for the next pipelines I will build!
Not sure about that, I think you guys solved it with your PipelineController implementation. I would need to test it before giving any feedback ๐
And I do that each time I want to create a subtask. This way I am sure to retrieve the task if it already exists