Reputation
Badges 1
43 × Eureka!another question - when running a non-dockerized agent and setting CLEARML_AGENT_SKIP_PIP_VENV_INSTALL , I still see things being installed when the experiment starts. Why does that happen?
I thought of some sort of gang-scheduling scheme should be implemented on top of the job.
Maybe the agents should somehow go through a barrier with a counter and wait there until enough agents arrived
Regardless, it would be very convenient to add a flag to the agent which point it to an existing virtual environment and bypassing the entire setup process. This would facilitate ramping up new users to clearml who don't want the bells and whistles and would just a simple HPO from an existing env (which may not even exist as part of a git repo)
I'm not working with tensorflow. I'm using SummaryWriter from torch.utils.tensorboard . Specifically add_pr_curve :
https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_pr_curve
great!
Is there a way to add this for an existing task's draft via the web UI?
oops. I used create instead of init π³
AgitatedDove14 , I'm running an agent inside a docker (using the image on dockerhub) and mounted the docker socket to the host so the agent can start sibling containers. How do I set the config for this agent? Some options can be set through env vars but not all of them π
the hack doesn't work if conda is not installed π
this is pretty weird. PL should only save from rank==0 :
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/connectors/checkpoint_connector.py#L394
Thanks AgitatedDove14 . I'll try that
so you dont have cuda installed π
You mean running everything on a single machine (manually)?
Yes, but not limited to this.
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
sounds great.
BTW the code is working now out-of-the-box. Just 2 magic line - import + Task.init
Lets start with a simple setup. Multi-node DDP in pytorch
Of course conda needs to be installed, it is using a pre-existing condaΒ env, no?! what am I missing
its not a conda env, just a regular venv (poetry in this specific case)
And the assumption is the code is also there ?
yes. The user is responsible for the entire setup. the agent just executes python <path to script> <current hpo args>
that was my next question π
How does this design work with a stateful search algorithm?
can you initialize a tensor on the GPU?
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
lol great hack. I'll check it out.
Although I'd be really happy if there was a solution in which I can just spawn an ad-hoc worker π
not really... what do you mean by "free" agent?