another question - when running a non-dockerized agent and setting CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
, I still see things being installed when the experiment starts. Why does that happen?
I'm trying to achieve a workflow similar to the one in wandb
for parameter sweep where there are no venvs involved other than the one created by the user 😅
I'm trying to achieve a workflow similar to the one
You mean running everything on a single machine (manually)?
It's a very convenient way of doing a parameter sweep on with minimal setup effort
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice the diff Task arguments/HyperParameters, are passed on the Task itself, and are resolved in runtime by the code, you just need to tell it to do so. For example it does Not need to pass any arguments to the script in order to change the argparser, this will be done in runtime when calling parse_args and is transparent)
Make sense ?
AgitatedDove14 Just to see that I understood correctly - in an HPO task, all subtasks (a specific parameter combination) are created and pushed to the relevant queue at the moment the main (HPO) task is created?
hows does this work with HPO?
the tasks are generated in advance?
ExcitedFish86
How do I set the config for this agent? Some options can be set through env vars but not all of themÂ
Hmm okay if you are running an agent inside a container and you want it to spin "sibling" containers, you need to se the following:
mount the docker socket to the container running the agent itself (as you did), basically adding " --privileged -v /var/run/docker.sock:/var/run/docker.sock
" Allow the host to mount cache and configuration from the host into the sibling container-e CLEARML_AGENT_DOCKER_HOST_MOUNT="/host/clearml/agent:/root/.clearml" -v /host/clearml/agent:/root/.clearml
3. Finally you can map the clearml.conf from the host directly to he container with -v /host/clearml.conf:/root/clearml.conf
Putting the three together you should end up with something like:docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -e "CLEARML_AGENT_DOCKER_HOST_MOUNT=/host/clearml/agent:/root/.clearml" -v "/host/clearml/agent:/root/.clearml" cleamrl-agent
wdyt?
BTW: any reason to run the agent inside a container and not on the host itself ?
AgitatedDove14 , I'm running an agent inside a docker (using the image on dockerhub) and mounted the docker socket to the host so the agent can start sibling containers. How do I set the config for this agent? Some options can be set through env vars but not all of them 😞
just seems a bit cleaner and more DevOps/k8s friendly to work with the container version of the agent 🙂
ExcitedFish86 that said if running in docker mode you can actually pass it on a Task basis with:-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/venv/bin/python
as an additional docker container argument on the Task "Execution" tab itself.
great!
Is there a way to add this for an existing task's draft via the web UI?
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
ExcitedFish86 Oh if this is the case:
in your cleaml.conf:agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: true
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L79
Then in the Task "base docker image" you can write the path to your conda env 🙂 (it will treat it as a readonly env, not installing any addtional packages)
in which I can just spawn an ad-hoc worker
Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
lol great hack. I'll check it out.
Although I'd be really happy if there was a solution in which I can just spawn an ad-hoc worker 🙂
I just don't fully understand the internals of an HPO process. If I create an Optimizer
task with a simple grid search, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing
its not a conda env, just a regular venv (poetry in this specific case)
And the assumption is the code is also there ?
yes. The user is responsible for the entire setup. the agent just executes python <path to script> <current hpo args>
I still see things being installed when the experiment starts. Why does that happen?
This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)
Regardless, it would be very convenient to add a flag to the agent which point it to an existing virtual environment and bypassing the entire setup process. This would facilitate ramping up new users to clearml
who don't want the bells and whistles and would just a simple HPO from an existing env (which may not even exist as part of a git repo)
Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?
It will not do any setup steps. Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace
This is an agent setting, not related to any specific task
, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
A bit of how clearml-agent works (and actually on how clearml itself works).
When running manually (i.e. not executed by an agent), Task.init (and similarly task.connect etc.) will log data on the Task itself (i.e. will send arguments /parameters to the server), This includes logint the argparser for example (and any other part of the automagic or manuall connect).
When running via an agent (or simulated agent run, see passing OS env), auto-magic parts works the other way around, Instead of logging parameters To the task it takes the arguments From the Task and puts them back into the code in runtime. For example this means that even if a script aith argparser is executed without Any command line arguments, when the argparser is parsing the "cmd" it is actually getting the data from the server
Specifically to the HPO case, the HPO optimizer prepares copies (clones) of the original Task, and changes the arguments on the Task object itself (i.e. stored on the backend). Then it pus the task into an execution queue. The agent pulls the Task from the execution queue, sets the environment (the part you wish to skip) and just Runs the code with flag telling it that it is now running with an agent and it should take the arguments from the server (and in practice override the default parameters)
Does that make sense ?
Try this one 🙂HyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally
the hack doesn't work if conda is not installed 😞
that was my next question 🙂
How does this design work with a stateful search algorithm?
You mean running everything on a single machine (manually)?
Yes, but not limited to this.
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
the hack doesn't work if conda is not installedÂ
Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing
Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace
And the assumption is the code is also there ?
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = task_id
env['CLEARML_LOG_TASK_TO_BACKEND'] = '1'
env['CLEARML_SIMULATE_REMOTE_TASK'] = '1'
p = subprocess.Popen(args=["python", "my_script_here.py"], env=env)
p.wait() Explanation: This will pop a Task from
queue_name_here ` , mark it as started, ans call the python process with two magic environment variables telling it (1) that it should simulate an agent (2) which Task it is running (i.e. the Task ID)
ExcitedFish86 wdyt?