
Reputation
Badges 1
43 × Eureka!Lets start with a simple setup. Multi-node DDP in pytorch
not really... what do you mean by "free" agent?
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
I think so. IMHO all API calls should maybe reside in a different module since they usually happen inside some control code
sounds great.
BTW the code is working now out-of-the-box. Just 2 magic line - import
+ Task.init
Of course conda needs to be installed, it is using a pre-existing condaΒ env, no?! what am I missing
its not a conda env, just a regular venv (poetry in this specific case)
And the assumption is the code is also there ?
yes. The user is responsible for the entire setup. the agent just executes python <path to script> <current hpo args>
hows does this work with HPO?
the tasks are generated in advance?
the hack doesn't work if conda is not installed π
Regardless, it would be very convenient to add a flag to the agent which point it to an existing virtual environment and bypassing the entire setup process. This would facilitate ramping up new users to clearml
who don't want the bells and whistles and would just a simple HPO from an existing env (which may not even exist as part of a git repo)
AgitatedDove14 , I'm running an agent inside a docker (using the image on dockerhub) and mounted the docker socket to the host so the agent can start sibling containers. How do I set the config for this agent? Some options can be set through env vars but not all of them π
I just don't fully understand the internals of an HPO process. If I create an Optimizer
task with a simple grid search, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
another question - when running a non-dockerized agent and setting CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
, I still see things being installed when the experiment starts. Why does that happen?
AgitatedDove14 Just to see that I understood correctly - in an HPO task, all subtasks (a specific parameter combination) are created and pushed to the relevant queue at the moment the main (HPO) task is created?
You mean running everything on a single machine (manually)?
Yes, but not limited to this.
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
lol great hack. I'll check it out.
Although I'd be really happy if there was a solution in which I can just spawn an ad-hoc worker π
I'm trying to achieve a workflow similar to the one in wandb
for parameter sweep where there are no venvs involved other than the one created by the user π
that was my next question π
How does this design work with a stateful search algorithm?
just seems a bit cleaner and more DevOps/k8s friendly to work with the container version of the agent π
It's a very convenient way of doing a parameter sweep on with minimal setup effort
Thanks AgitatedDove14 . I'll try that
cudnn isn't cuda, it's a separate library.
are you running on docker on bare metal? you should have cuda installed at /usr/local/cuda-<>