Reputation
Badges 1
979 × Eureka!Ok thanks! And for this?
Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
Thanks AgitatedDove14 ! I created a project with a default output destination to a s3 bucket but I don't have local access to this bucket (only agents have access to it for security reasons). Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
I was rather wondering why clearml was taking space while I configured it to use the /data volume. But as you described AgitatedDove14 it looks like an edge case, so I don’t mind 🙂
I am now trying with agent.extra_docker_arguments: ["--network='host'", ]
instead of what I shared above
AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...
AgitatedDove14 I eventually found a different way of achieving what I needed
ok, and if not the case, it will fall back to 3.8, right? Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
I created a snapshot of both disks
same as the first one described
CostlyOstrich36 , this also happens with clearml-agent 1.1.1 on a aws instance…
Yea so I assume that training my models using docker will be slightly slower so I'd like to avoid it. For the rest using docker is convenient
you mean “docker” was not installed and it did not throw an error ?
Yes docker was not installed in the machine
Yes you must make sure the docker can mount a persistent folder for you to work on.
Ok, it would be nice to have a --user-folder-mounted that do the linking automatically
Alright, thanks for the answer! Seems legit then 🙂
Thanks for the hack! The use case is the following: I have a controler that creates training/validation/testing tasks by cloning (so that the parent task id is properly set to the controler). Otherwise I could simply create these tasks with Task.init, but then I would need to set manually the parent task for each one of these tasks, probably with a similar hack, right?
This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily
AgitatedDove14 I made some progress:
In clearml.conf of the agent, I set: sdk.development.report_use_subprocess = false
(because I had the feeling that Task._report_subprocess_enabled = False
wasn’t taken into account) I’ve set task.set_initial_iteration(0)
Now I was able to get the followin graph after resuming -
Yes it would be very valuable to be able to tweak that param, currently it's quite annoying because it's set to 30 mins, so when a worker is killed by the autoscaler, I have to wait 30 mins before the autoscaler spins up a new machine because the autoscaler thinks there is already enough agents available, while in reality the agent is down
I ended up dropping omegaconf altogether
haa got it, I am on a self hosted server, that’s why I don’t see it
Which commit corresponds to RC version? So far we tested with latest commit on master (9a7850b23d2b0e1f2098ab051de58ce806143fff)
Alright, experiment finished properly (all models uploaded). I will restart it to check again, but seems like the bug was introduced after that