Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Switching From Wandb To Clearml In My Pytorch Ddp Training Script. With Wandb I Used To Have Worker Nr 1 Handle Logging To Wandb And Initiating The Connection. If I Simply Exchange Wandb Calls With Clearml Calls, Worker Nr 1, Which Handles The Co

Hi, I am switching from WandB to clearml in my pytorch DDP training script. With WandB i used to have worker nr 1 handle logging to WandB and initiating the connection. If i simply exchange WandB calls with clearml calls, worker nr 1, which handles the connection, start straggling behind in the validation and training loop, causing a massive slowdown of the training session, up to a factor 5. Is this normal, and if not, how should clearml be used when logging in pytorch multigpu DDP type training environment?

  
  
Posted 10 months ago
Votes Newest

Answers 5


It seems that running Task.init() outside in the main process before spawning multiple processes, and calling Task.current_task() inside the rank = 0 process works with no slowdown

  
  
Posted 10 months ago

It just puzzles me that if a single subprocess spawns a task, meaning that only one task is active, and that task is only accessed by the worker creating it, the massive slowdown happens

  
  
Posted 10 months ago

That makes sense. You should generally have only 1 task (initialized in the master process). The other subprocesses will inherit this task which should speed up the process

  
  
Posted 10 months ago

Hi Eugen
Thanks for the response!
let me know if you need a running example, that will require some effort to produce.

  
  
Posted 10 months ago

Hi @<1671689458606411776:profile|StormySeaturtle98> ! Do you have a sample snippet that could help us diagnose this problem?

  
  
Posted 10 months ago
721 Views
5 Answers
10 months ago
9 months ago
Tags