Hi, Do You Know More Or Less How Many Workers Can Server Work With. Have You Make Such Stress-Tests? The Thing Is That We Have Some Power Run Agent On (For Example 200 Agents). Will It Handle That? Suppose, That Machine That The Server Runs On Has Around

Answered

Hi, do you know more or less how many workers can server work with. Have you make such stress-tests?
The thing is that we have some power run agent on (for example 200 agents). Will it handle that?
Suppose, that machine that the server runs on has around 64GB.

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

Votes Newest

Answers 13

That depends on what the workers are doing, but in general such a spec should definitely work

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Do we even have an option to assign id to each agent? https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_daemon

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

What deployment are you using? Docker-compose?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SuccessfulKoala55 we did it through default Docker-compose file.

If there a way to give more resources for server to help it somehow?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

Well, my first question would be what is the worker name/id assigned to each one? Using the same ID might hide some of them?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

OK, I think what you need to do is scale up the number of apiserver worker processes - pass the CLEARML_USE_GUNICORN=1 environment variable to the apiserver service, this should start 8 processes (by default) instead of one, and see if it helps. By the way, while this number (number of processes) can be set even higher, at some point, I assume you'll start having issues with load on the elasticsearch service, which is not that easy to scale up.

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SuccessfulKoala55 could we run a server with some verbose logging?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

SuccessfulKoala55 hmm, we are trying to do something like that and we are encountering problems. We are doing big hyperparameter optimization on 200 workers and some tasks are failing (while with less workers they are not failing). Also, UI also has some problems with that. Maybe there are some settings that should be corrected in comparison to classic configuration?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

You need to set that in the environment section of the apiserver service in the docker-compose.yaml file. And yes, you'll need to run docker-compose up again

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yes, Docker-compose

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

SuccessfulKoala55 How should I pass this variable? Do I need to create a file apiserver.conf in folder /opt/clearml/config and write there just CLEARML_USE_GUNICORN=1 . Do I need to restart a server after that?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

SuccessfulKoala55 We are encountering some strange problem. We are spinning N agents using script, in a loop

But not all agents are visible as workers (we check it both in UI, but also running workers_list = client.workers.get_all() ).

Do you think that is it possibility that too much of them are connecting at once and we can solve that by setting a delay between running subsequent agents?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoundMosquito25
				
					0
					 × 1

Did you change anything in the compose file or are you using the default settings?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

2K Views

13 Answers

2 years ago