Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Https://Clearml.Slack.Com/Archives/Ctk20V944/P1713357955958089

Votes Newest

Answers 18


Thank you @<1523701949617147904:profile|PricklyRaven28> !!!
Let me see if we can reproduce and how to solve it

  
  
Posted one month ago

Hi @<1523701949617147904:profile|PricklyRaven28> ! Thank you for the example. We managed to reproduce. We will investigate further to figure out the issue

  
  
Posted one month ago

Hi @<1523701435869433856:profile|SmugDolphin23>
Confirming that rank0 process does not hang with the new version!

The accelerate CLI problem does still reproduce though (it's in my demo)

  
  
Posted one month ago

@<1523701205467926528:profile|AgitatedDove14>
Only got some time to work on it now, i created a small reproducible example.
I also tried to use your suggestion with import accelerate, it also had issues.

overall, when using debug_pipeline it works ok, but both methods don't work without it, i think it has something to do with wrapping accelerate.

Problem with launching through python module (your suggestion), the argparse breaks.
Problem with launching using a new process - rank0 process hangs and never finishes.

Both work fine with debug_pipeline

  
  
Posted one month ago

to make it very reproducible, i created a docker file for it, so make sure to run build_docker.sh and then run.sh

  
  
Posted one month ago

Glad to hear you were able to reproduce it! Waiting for your reply 🙏

  
  
Posted one month ago

Interesting, i wasn't aware of this python module for executing accelerate. I'll try to use that.

It's essentially the cmd line:
None
None

  
  
Posted 2 months ago

Hi @<1523701949617147904:profile|PricklyRaven28>
Sorry, we missed that one

we need to invoke it with

accelerate launch

so we use

subprocess.run

So you have two options, either you change the script entry of the Task from your " script.py " to" -m accelerate launch script.py
or you manually do that inside your entry point (i.e. call accelerate launch)
BTW, I "think" we added an "auto detect" for it, so that if you launched it manually this way it will know to register it as " -m accelerate launch ... "

  
  
Posted 2 months ago

How does this work in the context of a pipeline?

Is your pipeline from functions / decorators ? or is it from Tasks ?
(if this is Tasks then just changing the entry point in the overides)
In case of functions or decorators, you have to do that manually (i.e. your function needs to do "accelerate launch"

from accelerate.commands.launch import launch_command, launch_command_parser
parser = launch_command_parser()
args = parser.parse_args("-command -here".split())
launch_command(args)
  
  
Posted 2 months ago

It's with decorators.

Interesting, i wasn't aware of this python module for executing accelerate. I'll try to use that.

We used subprocess for it, but for some reason only when invoked in the pipeline the process freezes and doesn't close the main accelerate process. Works fine outside of clearml, any Idea?

  
  
Posted 2 months ago

How does this work in the context of a pipeline? One of the steps is a multi gpu training that requires accelerate.

  
  
Posted 2 months ago

We used subprocess for it, ...

Popen? os.system? fork?

  
  
Posted 2 months ago

We tried both subprocess.run and popen

  
  
Posted 2 months ago

If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)

Yes please! 🙏
In the mean time see if the workaround is a valid one

  
  
Posted 2 months ago

If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)

  
  
Posted 2 months ago

@<1523701949617147904:profile|PricklyRaven28> Can you please try clearml==1.16.2rc0 ? We have released a fix that will hopefully solve your problem

  
  
Posted one month ago

@<1523701435869433856:profile|SmugDolphin23> @<1523701205467926528:profile|AgitatedDove14>
Any updates? 🙂

  
  
Posted one month ago

@<1523701949617147904:profile|PricklyRaven28> thank you for the feedback. We will investigate this further

  
  
Posted one month ago
287 Views
18 Answers
2 months ago
one month ago
Tags
Similar posts