Hi All, I Am Trying To Execute Somewhat Custom Hpo Scheme With Clearml. I Would Want That A Single Running Python Script Will Be Able To Sample The Optimizer, Init A Task And Report The Result Multiple Times. I Didn'T Find Anything Similar In The Docs Or

Unanswered

we have some other parts, and for some cases we get initialization time can be about 10 times the experiment time

Before I dive into some agent in agent hacking, I would consider "caching" this preprocessing on an auxiliary Task as an artifact. Basically add another argument for the auxiliary Task, and fetch the data from it (obviously you will need to run it once before the optimizer launches the first experiment).
Now that is out of the way (which really would be the preferred engineering solution) 🙂

This sounds like it can work. we are talking about something like:

Exactly!
In order to do that we have a new "agent-Task" that we manually enqueue (this controls the number of machines that will be running the code). You can see below an "agent-Task" pulling Tasks from "default" queue and spawning them as subprocess (one process per agent-task). Notice I have not been able to fully test the code, but you can run it manually and verify it actually works 🙂 (btw: no need for the LocalClearmlJob, from the optimizer perepective it just launches jobs on the "default" queue)
Let me know it works 🤞
` import sys
import os
import subprocess
import time
from clearml.backend_api.session.client import APIClient
from clearml import Task

def spawn_sub_task(task):
# create the subprocess
cmd = task.data.execution.script.entrypoint
python = sys.executable
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = env['TRAINS_TASK_ID'] = task.id
env['CLEARML_LOG_TASK_TO_BACKEND'] = 1
env['CLEARML_SIMULATE_REMOTE_TASK'] = 1
p = subprocess.Popen(args=[python, cmd], cwd=os.getcwd(), env=env)
p.wait()
return True

task = Task.init('project', 'agent task')
params = {'queue_name': 'default'}
task.connect(params)

c = APIClient()
queue_id = c.queues.get_all(name=params['queue_name'])[0].id

while True:
result = c.queues.get_next_task(queue=queue_id)
if not result or not result.entry:
time.sleep(5)
continue
run_task = Task.get_task(task_id=result.entry.task)
spawn_sub_task(run_task) `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

157 Views

0 Answers

3 years ago

one year ago