Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

Hi, i would like to check what would be the recommended hardware specs for the server host ClearML server.

I had one configured with 32 CPU cores, 64GB ram and i noticed that if we have a surge in remote task creation, the following delays occurs.
Each individual task creation can be delayed for quite a while, compare to no delays when only one or two tasks are created.task = Task.init(...) task.set_base_docker("...") task.execute_remotely(..., exit_process=True)In the following logs on the client, the execution could 'hang' for up to 20 secs on any line.
ClearML Task: created new task id=7563485622 ClearML results page: https://...../output/log clearml.Task - INFO - Waiting for repository detection and full package analysis clearml - WARNING - Switching to remote execution, outout log page http://.../output/log clearml - WARNING - Terminating local execution process
2. The ClearML web interface starts to lag significantly.

  
  
Posted 3 years ago
Votes Newest

Answers 10


Hi SubstantialElk6

32 CPU cores, 64GB ram

Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded

  
  
Posted 3 years ago

We are running on a 1gbps backend.

  
  
Posted 3 years ago

SubstantialElk6 is this the issue ?

  
  
Posted 3 years ago

If the only issue is this line
task.execute_remotely(..., exit_process=True)It has to finish the static analysis of the entire repository (which usually happens in the background but now we have to wait for it). If the repo is large this could actually take 20sec (depending on CPU/drive of the machine itself)

  
  
Posted 3 years ago

no worries

  
  
Posted 3 years ago

The server is running only the ClearML components. Could you advise on the ELB part, how should we optimise it?

  
  
Posted 3 years ago

We are using k8s glue to spawn the job. ...

I think this is actual network latency, nothing to do with the jobs, could it be the server is very far away?
What happens when you manually start a Task from your machine ?
Is the latency fixed? Is it just when starting a new Task?

  
  
Posted 3 years ago

Hi, i will have to get back to you again. Need to check every client's repo to determine your hypothesis.

  
  
Posted 3 years ago

Wait I might be completely off.
Is this line "hangs" ?

task.execute_remotely(..., exit_process=True)

  
  
Posted 3 years ago

We are using k8s glue to spawn the job. Would you be able to advise in detail of steps on what goes on when the above code executes?

  
  
Posted 3 years ago
1K Views
10 Answers
3 years ago
one year ago
Tags