Reputation
Badges 1
43 × Eureka!the conda sets up cuda I think
note that the cuda driver was only recently added to nvidia-smi
so you dont have cuda installed π
I'm trying to achieve a workflow similar to the one in wandb for parameter sweep where there are no venvs involved other than the one created by the user π
You mean running everything on a single machine (manually)?
Yes, but not limited to this.
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
AgitatedDove14 , I'm running an agent inside a docker (using the image on dockerhub) and mounted the docker socket to the host so the agent can start sibling containers. How do I set the config for this agent? Some options can be set through env vars but not all of them π
AgitatedDove14 Just to see that I understood correctly - in an HPO task, all subtasks (a specific parameter combination) are created and pushed to the relevant queue at the moment the main (HPO) task is created?
I just don't fully understand the internals of an HPO process. If I create an Optimizer task with a simple grid search, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
lol great hack. I'll check it out.
Although I'd be really happy if there was a solution in which I can just spawn an ad-hoc worker π
hows does this work with HPO?
the tasks are generated in advance?
great!
Is there a way to add this for an existing task's draft via the web UI?
as a workaround I just stick the epoch number in the series argument of report_scatter2d , with the same title name
just to be clear, multiple CUDA runtime version can coexist on a single machine, and the only thing that points to which one you are using when running an application are the library search paths (which can be set either with LD_LIBRARY_PATH , or, preferably, by creating a file under /etc/ld.so.conf.d/ which contains the path to your cuda directory and executing ldconfig )
cudnn isn't cuda, it's a separate library.
are you running on docker on bare metal? you should have cuda installed at /usr/local/cuda-<>
this is the cuda driver api. you need libcudart.so
It's a very convenient way of doing a parameter sweep on with minimal setup effort
I think so. IMHO all API calls should maybe reside in a different module since they usually happen inside some control code
oops. I used create instead of init π³
the hack doesn't work if conda is not installed π
Of course conda needs to be installed, it is using a pre-existing condaΒ env, no?! what am I missing
its not a conda env, just a regular venv (poetry in this specific case)
And the assumption is the code is also there ?
yes. The user is responsible for the entire setup. the agent just executes python <path to script> <current hpo args>
JitteryCoyote63 I still don't understand what is the actual CUDA version you are using on your machine
Thanks AgitatedDove14 . I'll try that