Reputation
Badges 1
43 × Eureka!I thought of some sort of gang-scheduling scheme should be implemented on top of the job.
Maybe the agents should somehow go through a barrier with a counter and wait there until enough agents arrived
this is pretty weird. PL should only save from rank==0 :
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/connectors/checkpoint_connector.py#L394
as a workaround I just stick the epoch number in the series
argument of report_scatter2d
, with the same title name
The legacy version worked just before I mv
ed the folder but now (after reverting to the old name) that doesn't work also 😢
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
oops. I used create instead of init 😳
not really... what do you mean by "free" agent?
another question - when running a non-dockerized agent and setting CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
, I still see things being installed when the experiment starts. Why does that happen?
I'm trying to achieve a workflow similar to the one in wandb
for parameter sweep where there are no venvs involved other than the one created by the user 😅
It's a very convenient way of doing a parameter sweep on with minimal setup effort
AgitatedDove14 Just to see that I understood correctly - in an HPO task, all subtasks (a specific parameter combination) are created and pushed to the relevant queue at the moment the main (HPO) task is created?
hows does this work with HPO?
the tasks are generated in advance?
AgitatedDove14 , I'm running an agent inside a docker (using the image on dockerhub) and mounted the docker socket to the host so the agent can start sibling containers. How do I set the config for this agent? Some options can be set through env vars but not all of them 😞
just seems a bit cleaner and more DevOps/k8s friendly to work with the container version of the agent 🙂
great!
Is there a way to add this for an existing task's draft via the web UI?
Thanks AgitatedDove14 . I'll try that
sounds great.
BTW the code is working now out-of-the-box. Just 2 magic line - import
+ Task.init
lol great hack. I'll check it out.
Although I'd be really happy if there was a solution in which I can just spawn an ad-hoc worker 🙂
I just don't fully understand the internals of an HPO process. If I create an Optimizer
task with a simple grid search, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?