Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Do You Know More Or Less How Many Workers Can Server Work With. Have You Make Such Stress-Tests? The Thing Is That We Have Some Power Run Agent On (For Example 200 Agents). Will It Handle That? Suppose, That Machine That The Server Runs On Has Around

Hi, do you know more or less how many workers can server work with. Have you make such stress-tests?
The thing is that we have some power run agent on (for example 200 agents). Will it handle that?
Suppose, that machine that the server runs on has around 64GB.

  
  
Posted 2 years ago
Votes Newest

Answers 13


OK, I think what you need to do is scale up the number of apiserver worker processes - pass the CLEARML_USE_GUNICORN=1 environment variable to the apiserver service, this should start 8 processes (by default) instead of one, and see if it helps. By the way, while this number (number of processes) can be set even higher, at some point, I assume you'll start having issues with load on the elasticsearch service, which is not that easy to scale up.

  
  
Posted 2 years ago

SuccessfulKoala55 We are encountering some strange problem. We are spinning N agents using script, in a loop

But not all agents are visible as workers (we check it both in UI, but also running workers_list = client.workers.get_all() ).

Do you think that is it possibility that too much of them are connecting at once and we can solve that by setting a delay between running subsequent agents?

  
  
Posted 2 years ago

SuccessfulKoala55 hmm, we are trying to do something like that and we are encountering problems. We are doing big hyperparameter optimization on 200 workers and some tasks are failing (while with less workers they are not failing). Also, UI also has some problems with that. Maybe there are some settings that should be corrected in comparison to classic configuration?

  
  
Posted 2 years ago

SuccessfulKoala55 How should I pass this variable? Do I need to create a file apiserver.conf in folder /opt/clearml/config and write there just CLEARML_USE_GUNICORN=1 . Do I need to restart a server after that?

  
  
Posted 2 years ago

You need to set that in the environment section of the apiserver service in the docker-compose.yaml file. And yes, you'll need to run docker-compose up again

  
  
Posted 2 years ago

Do we even have an option to assign id to each agent? https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_daemon

  
  
Posted 2 years ago

yes, Docker-compose

  
  
Posted 2 years ago

SuccessfulKoala55 we did it through default Docker-compose file.

If there a way to give more resources for server to help it somehow?

  
  
Posted 2 years ago

That depends on what the workers are doing, but in general such a spec should definitely work

  
  
Posted 2 years ago

Did you change anything in the compose file or are you using the default settings?

  
  
Posted 2 years ago

SuccessfulKoala55 could we run a server with some verbose logging?

  
  
Posted 2 years ago

What deployment are you using? Docker-compose?

  
  
Posted 2 years ago

Well, my first question would be what is the worker name/id assigned to each one? Using the same ID might hide some of them?

  
  
Posted 2 years ago
1K Views
13 Answers
2 years ago
one year ago
Tags