ClearML FAQ | Hi I Want To Have Several Boards Connected To The Same Experiment Manager, And Have Agents On The Manager Using These Boards, One Agent For Each Board. I Thought That If I Know What The Agent Is, I Can Assign One Board Per Agent

Answered

Hi I Want To Have Several Boards Connected To The Same Experiment Manager, And Have Agents On The Manager Using These Boards, One Agent For Each Board. I Thought That If I Know What The Agent Is, I Can Assign One Board Per Agent - If The Agent Is 1, Then

Hi
I want to have several boards connected to the same experiment manager, and have agents on the manager using these boards, one agent for each board.
I thought that if I know what the agent is, I can assign one board per agent - if the agent is 1, then use IP X, if 2, use IP Y, and so on.

My questions are:

Can I know, while running a task, which agent is running it?
Can I custom name an agent while spinning it?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallPigeon24
				
					0
					 × 1

Votes Newest

Answers 11

@<1533619716533260288:profile|SmallPigeon24> I assume by "board" you mean "queue"?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi @<1533619716533260288:profile|SmallPigeon24> , to your questions :

Yes. It's under 'worker name'
You can set worker name in clearml.conf with agent.worker_name

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

@<1523701087100473344:profile|SuccessfulKoala55> No, I mean a chip. A piece of hardware that cannot, on it's on, run an agent and as such an attached computer - in this case, the server - will have an agent accessing it via ssh. In my case, I want to have a "board farm" - multiple boards for running inferences on them, and I'd like to have them all connected to the same server.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallPigeon24
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> Will it work? Assume I have 3 workers.
1 takes a task
2 takes a task
1 checks last_worker

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallPigeon24
				
					0
					 × 1

@<1533619716533260288:profile|SmallPigeon24> you can use the task's last_worker property

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> I disagree. Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
This is necessary for board farms, or any non-tiny scale of work.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallPigeon24
				
					0
					 × 1

@<1533619716533260288:profile|SmallPigeon24> why would two workers take the same task? You can only have one agent running a task, so you will always get the last_worker representing the agent running the task

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Will it get 1 or 2?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallPigeon24
				
					0
					 × 1

@<1533619716533260288:profile|SmallPigeon24> the intent behind queues supporting multiple workers is to have many consumers for that queue ("producer") - it does not mean multiple workers can pull the same task from the queue

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.

@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.

Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agent pulls a task that agent needs to "spin it" multiple times on the "remote boards"

Another option is creating multiple Tasks (i.e. cloning), then use the user-properties, or hyper parameters, to tell the Code where to report to (of course this implies manually calling the Logger class, because the auto logging is going directly to the main task)
To me this seems the most logical, as it allows to debug the individual executions as well as have a combined view of the metrics.

Lastly you can always use the comparison feature to have all the individual metrics on the same graph

wdyt ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> Hi!
Thank you very much for the informative answer.
I have a follow-up question on q.1: Is there a pythonic way to retrieve that info mid-run?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallPigeon24
				
					0
					 × 1

Write your answer

1K Views

11 Answers

one year ago