Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Answered

Hi guys!
Is there a way to tell an agent to run a task in an existing venv (without creating a new one)?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Votes Newest

Answers 30

great!
Is there a way to add this for an existing task's draft via the web UI?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

This is an agent setting, not related to any specific task

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

ExcitedFish86 that said if running in docker mode you can actually pass it on a Task basis with:
-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/venv/bin/pythonas an additional docker container argument on the Task "Execution" tab itself.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks AgitatedDove14 . I'll try that

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

AgitatedDove14 , I'm running an agent inside a docker (using the image on dockerhub) and mounted the docker socket to the host so the agent can start sibling containers. How do I set the config for this agent? Some options can be set through env vars but not all of them 😞

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

ExcitedFish86

How do I set the config for this agent? Some options can be set through env vars but not all of them

Hmm okay if you are running an agent inside a container and you want it to spin "sibling" containers, you need to se the following:
mount the docker socket to the container running the agent itself (as you did), basically adding " --privileged -v /var/run/docker.sock:/var/run/docker.sock " Allow the host to mount cache and configuration from the host into the sibling container-e CLEARML_AGENT_DOCKER_HOST_MOUNT="/host/clearml/agent:/root/.clearml" -v /host/clearml/agent:/root/.clearml3. Finally you can map the clearml.conf from the host directly to he container with -v /host/clearml.conf:/root/clearml.conf

Putting the three together you should end up with something like:
docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -e "CLEARML_AGENT_DOCKER_HOST_MOUNT=/host/clearml/agent:/root/.clearml" -v "/host/clearml/agent:/root/.clearml" cleamrl-agent
wdyt?

BTW: any reason to run the agent inside a container and not on the host itself ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

just seems a bit cleaner and more DevOps/k8s friendly to work with the container version of the agent 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

another question - when running a non-dockerized agent and setting CLEARML_AGENT_SKIP_PIP_VENV_INSTALL , I still see things being installed when the experiment starts. Why does that happen?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

I still see things being installed when the experiment starts. Why does that happen?

This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm trying to achieve a workflow similar to the one in wandb for parameter sweep where there are no venvs involved other than the one created by the user 😅

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

It's a very convenient way of doing a parameter sweep on with minimal setup effort

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

I'm trying to achieve a workflow similar to the one

You mean running everything on a single machine (manually)?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Try this one 🙂
HyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

You mean running everything on a single machine (manually)?

Yes, but not limited to this.
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task

ExcitedFish86 Oh if this is the case:
in your cleaml.conf:
agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: truehttps://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L79
Then in the Task "base docker image" you can write the path to your conda env 🙂 (it will treat it as a readonly env, not installing any addtional packages)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

lol great hack. I'll check it out.
Although I'd be really happy if there was a solution in which I can just spawn an ad-hoc worker 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

in which I can just spawn an ad-hoc worker

Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

the hack doesn't work if conda is not installed 😞

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?

It will not do any setup steps. Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

the hack doesn't work if conda is not installed

Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing

Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace

And the assumption is the code is also there ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing

its not a conda env, just a regular venv (poetry in this specific case)

And the assumption is the code is also there ?

yes. The user is responsible for the entire setup. the agent just executes python <path to script> <current hpo args>

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient

client = APIClient()

queue_ids = client.queues.get_all(name="queue_name_here")

while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = task_id
env['CLEARML_LOG_TASK_TO_BACKEND'] = '1'
env['CLEARML_SIMULATE_REMOTE_TASK'] = '1'
p = subprocess.Popen(args=["python", "my_script_here.py"], env=env)
p.wait() Explanation: This will pop a Task from queue_name_here ` , mark it as started, ans call the python process with two magic environment variables telling it (1) that it should simulate an agent (2) which Task it is running (i.e. the Task ID)
ExcitedFish86 wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

hows does this work with HPO?
the tasks are generated in advance?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)

hows does this work with HPO?

The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice the diff Task arguments/HyperParameters, are passed on the Task itself, and are resolved in runtime by the code, you just need to tell it to do so. For example it does Not need to pass any arguments to the script in order to change the argparser, this will be done in runtime when calling parse_args and is transparent)
Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I just don't fully understand the internals of an HPO process. If I create an Optimizer task with a simple grid search, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Regardless, it would be very convenient to add a flag to the agent which point it to an existing virtual environment and bypassing the entire setup process. This would facilitate ramping up new users to clearml who don't want the bells and whistles and would just a simple HPO from an existing env (which may not even exist as part of a git repo)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?

A bit of how clearml-agent works (and actually on how clearml itself works).
When running manually (i.e. not executed by an agent), Task.init (and similarly task.connect etc.) will log data on the Task itself (i.e. will send arguments /parameters to the server), This includes logint the argparser for example (and any other part of the automagic or manuall connect).
When running via an agent (or simulated agent run, see passing OS env), auto-magic parts works the other way around, Instead of logging parameters To the task it takes the arguments From the Task and puts them back into the code in runtime. For example this means that even if a script aith argparser is executed without Any command line arguments, when the argparser is parsing the "cmd" it is actually getting the data from the server

Specifically to the HPO case, the HPO optimizer prepares copies (clones) of the original Task, and changes the arguments on the Task object itself (i.e. stored on the backend). Then it pus the task into an execution queue. The agent pulls the Task from the execution queue, sets the environment (the part you wish to skip) and just Runs the code with flag telling it that it is now running with an agent and it should take the arguments from the server (and in practice override the default parameters)

Does that make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Just to see that I understood correctly - in an HPO task, all subtasks (a specific parameter combination) are created and pushed to the relevant queue at the moment the main (HPO) task is created?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

that was my next question 🙂
How does this design work with a stateful search algorithm?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ExcitedFish86
				
					0
					 × 1

Write your answer

2K Views

30 Answers

3 years ago

2 years ago