Hello. I Have An Issue I Can'T Seem To Debug. Maybe Someone Knows How To Fix It. I Have Two Scripts

Answered

Hello. I have an issue I can't seem to debug. Maybe someone knows how to fix it.

I have two scripts
enque_task.py that schedules a task:... task = clearml.Task.create( project_name=PROJECT_NAME, task_name=name, task_type=task_type, docker=docker, repo=REPO_URL, commit=commit, docker_args="--shm-size=8g", script="train.py", ) clearml.Task.enqueue(task, queue_name=queue) ,,, `` train.py that launches trainingclearml_args = config.get("clearml", {"project_name": PROJECT_NAME, "task_name": "Unnamed experiment"}) task = Task.init(**clearml_args) `` clearml_args is dict loaded from a config file stored in the repo. And include fields like remote_uri and tags

When i launch train.py locally everything is good. remote_uri and tags are being set and all works fine.
But when I run enque_task.py and the scheduled task is being executed on a worker, remote_uri and tags are not set, despite the fact that Task.init is being called with the same parameters as when I do it locally.

Can you tell me what is it, that I am doing wrong?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

Votes Newest

Answers 12

Can you please open a GitHub issue?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Maybe this is something we can add 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55>

So basically my problem was that I couldn't specify ouput_uri with Task.creaate .

I ended up with a solution to just use CLI version of clearml-task that allows for specifying output_uri (but not tags, though).

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

@<1523701137134325760:profile|CharmingStarfish14> the explanation is very simple - this is not a bug, but part of how ClearML SDK and Agent work.
When you run a task locally (or create it), everything that you provide is stored on the task metadata in the server.
When such a task is executed remotely by an agent (after you enqueued it), the Task.init() is not ignored, it just does different things - instead of storing all settings to the server, it reads all previously stored settings from the server, and applies them to the task object/setup being run. This is part of the concept allowing you to create tasks from code, and than clone them and change their parameters/settings from the UI (or using API/SDK) before scheduling the cloned tasks for remote execution.
Without this, tasks would be static constructs that always use the same configuration hard-coded in the Task.init() call (or other configurations) and can never be affected externally by the system.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

CharmingStarfish14 , maybe SuccessfulKoala55 can assist

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 It seems to be a critical bug.

Do you happen to know a support channel, that can help with that?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

UPD: it doesn't solve anything 😞
This approach just creates a separate task corresponding to enque_task.py script. But the task that is being run on clearml agent still ignored outpur_uri 😞

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

I figured the problem.

Reason:
If you create a clearml task and put it into queue, all further Task.init call arguments from clearml worker will be ignored.

Solution:
enque_task.py
task = clearml.Task.create(...) task.init(remote_uri=..., tags=...) clearml.Task.enqueue(task, queue_name=queue)
train.py
task = Task.Init(<whatever, all this args will be ignored>)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

y. In the second case you run a script that creates a task with

Task.create()

which creates a draft task with execution parameters, output uri etc (Nothing in configuration I assume? Please check

In the second case I only call Task.create , that specifies docker, repository, commit, script path and so on (but doesn't specify output_uri or tags . They should be set in train.py , when the Task.Init is called).

Afterwards on the remote machine task is pulled by agent (is it running in docker mode?)

I provide docker and docker_args to Task.create . I believe it means i run it in the docker mode , right?

At which point were the init params changed in the second case?

In second case the init params are not meant to be changed. They are meant to be set during execution of train.py by clearml agent. But instead they are ignored completely.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

So in the first case you just run a task locally. In the second case you run a script that creates a task with Task.create() which creates a draft task with execution parameters, output uri etc (Nothing in configuration I assume? Please check). This task is then enqueued and the script exists.

Afterwards on the remote machine task is pulled by agent (is it running in docker mode?) same code is pulled and execution begins.

At which point were the init params changed in the second case? From my understanding you just create a script that creates a task with the Task.init() (train.py) from repository. The code then runs and uses params in the cloned repo. What am I missing?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I guess I can simplify it a little.
Basically there are two scenarios:

on local machine
task = Task.Init(output_uri="...", tags=["tag"])Result: everything works. Remote uri is used, tags are set

on local machine
task = clearml.Task.create(...) clearml.Task.enqueue(task, queue_name=queue)> on remote clearml agent, the same code is called
task = Task.Init(output_uri="...", tags=["tag"])Result: init params are ignored, remote uril is not set, tags are empty

Does it make my problem description clearer? I'm not sure if it's a bug or if I'm missing something.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CharmingStarfish14
				
					0
					 × 1

Hi CharmingStarfish14 , I think it comes from the way that the clearml-agent works if I understand correctly your issue. When running in remote it uses the values on the backend. So for example if you take a task and clone it, assuming the task uses parameters from the repo and they change, the agent will take the parameters that are logged in the ClearML backend. So for new parameters to take affect you need to clone the task, change the parameters in the cloned task (Either by UI or programmatically) and then enqueue the task.

What is your use case? I think Pipelines might be beneficial to your use case.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

2K Views

12 Answers

2 years ago