Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All! Is There Any Simple Way To Use

Hi all!

Is there any simple way to use argparse to pass a clearml task name?
I was using an argument called --clearml_task for this, but i ran into an interesting issue: to track args , i need to call task.Init(task_name=args.clearml_task) , but to get the args object (and overwrite it on the remote clearml-agent), i need to call args = task.connect(args) .
So I have a chicken-and-egg situation

My solution was a workaround like this:

def get_task_if_remote():
    # Set by clearml-agent
    task_id = os.environ.get("CLEARML_TASK_ID")
    if task_id is not None:
        return Task.get_task(task_id=task_id)

if __name__=="__main__":
    task = get_task_if_remote()

    if task is None: # First run
        args = get_arg_parser().parse_args()
        is_remote=False
    else: # ClearML is running this remotely
        is_remote=True
        args = get_arg_parser().parse_args(["fake_config.yaml", "--clearml_id", task.task_id])

    task, cfg, args = prep_clearml(args)

    if not is_remote:
        task.execute_remotely(queue_name = args.remote)
  
  
Posted 11 months ago
Votes Newest

Answers 17


Hi @<1556450111259676672:profile|PlainSeaurchin97>

Is there any simple way to use

argparse

to pass a clearml task name?

need to call

args = task.connect(args)

.

noooo πŸ™‚ there is no need to do that, the arguments are automatically detected
see for yourself

args = parse_args()
task = Task.init(task_name=args.task_name)
  
  
Posted 11 months ago

I actually have a question about your original code snipped, @<1556450111259676672:profile|PlainSeaurchin97> . I have been trying to figure out a way to access the task object when running remotely so that I can instantiate the logger but when I tried task_id = os.getenv("CLEARML_TASK_ID") , it’s returning None . I also tried Task.current_task() and also got None back. What is the recommended way to access the Task object from within the remote agent?

  
  
Posted 11 months ago

To be honest, i don't think using this envvar is the best option. I think just getting the task as normal (from the task name using Task.init) is the better option

But for these edge cases like i described, CLEARML_TASK_ID is ok

  
  
Posted 11 months ago

Okay, I take it back. os.getenv("CLEARML_TASK_ID") does work. I forgot to rebuild my container after making the change. Thanks for bringing this option to my attention!

  
  
Posted 11 months ago

Since I can't use the

torchrun

comand (from my tests, clearml won't use it on the clearm-agent), I went with the

did you check this example?

@<1523701205467926528:profile|AgitatedDove14> actually i did! I based my code adaptation around it, since originally i was running a shell script that called torchrun

But tbh I didn't want to mess too much with my existing code, so i just did a quick and dirty adaptation using the torch.distributed.run command.

  
  
Posted 11 months ago

Hmm yeah I can see why...
Now that I think about it, at least in theory the second process that torch creates, should inherit from the main one, and as such Task.init is basically "ignored"
Now I wonder why your first version of the code did not work?
Could it be that we patched the argparser on the subprocess and that we should not have?

  
  
Posted 11 months ago

Oh wait. Do I need the Task to exist in the subprocesses?
I re-create it on the subprocesses, because I thought my tensorboard stuff wouldn't get logged if the task wasn't initialized

  
  
Posted 11 months ago

just to be clear, this works on my local machine:

distributed_args = torch.distributed.run.parse_args(sys.argv)
distributed_args.nproc_per_node = args.gpus
torch.distributed.run.run(distributed_args)

But not when clearml-agent runs it

So the args are patched on the "main" process, but only on the remote worker

  
  
Posted 11 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> , made this mock test real quick, it reproduces the issue:
None

  
  
Posted 11 months ago

Thanks!

Follow-up question: how does clearML "inject" the argparse arguments before the task is initialized?
Does it mess with sys.argv ? Does it inject itself into argparse ?

I had to do another workaround since when torch.distributed.run called it's ArgumentParser , it was getting the arguments from my script (and from my task) instead of the ones I passed it

  
  
Posted 11 months ago

Yes this is exactly the solution!
Nice 🎊 !

  
  
Posted 11 months ago

Are you saying you "manually" pares args ?

More or less! Maybe there's a simpler solution that I haven't found yet.

I'm using torch.distributed.run to run my training on multiple GPU's.
Since I can't use the torchrun comand (from my tests, clearml won't use it on the clearm-agent), I went with the following workaround:

distributed_args = torch.distributed.run.parse_args(sys.argv)
distributed_args.nproc_per_node = args.gpus
torch.distributed.run.run(distributed_args)

Which would be the equivalent of calling torchrun train.py arg1 arg2 ...

Except since clearml patches the parse_args call inside of the torch.distributed.run.parse_args function, it generates the same arguments i passed to script.py and gives an error like "error: the following arguments are required: torchrun_arg_1 , torchrun_arg_2 ..."

  
  
Posted 11 months ago

Follow-up question: how does clearML "inject" the argparse arguments before the task is initialized?

it patches the actual parse_args call, to make sure it works you just need to make sure it was imported before the actual call takes place

I had to do another workaround since when

torch.distributed.run

called it's

ArgumentParser

, it was getting the arguments from my script (and from my task) instead of the ones I passed it

Are you saying you "manually" pares args ?

  
  
Posted 11 months ago

My final solution was to manually detect if i needed to patch the original argparse on the training script ( by using the CLEARML_TASK_ID envvar) and to turn off the automatic argparse connection

  
  
Posted 11 months ago

  • Yes Task.init should be called on each subprocess (because torch forks them before they ar epatched)
  • I think the main issue is that we patch the argparse on the Subprocess (this is assuming you did not manually parse non argv argument)
  • If you can create a mock test I think we can work around the issue, as long as the way you spin it is the standard pytorch distub way
  
  
Posted 11 months ago

Since I can't use the

torchrun

comand (from my tests, clearml won't use it on the clearm-agent), I went with the

@<1556450111259676672:profile|PlainSeaurchin97> did you check this example?
None

  
  
Posted 11 months ago

OK, so i got into this mess with the argparse because i was turning OFF the automatic detection of command line arguments

I was turning it off because i was calling, inside my script, the argparser from torch.distributed.run ( best way i found to run a torchrun command in the clearml-agent)

Because of torch.distributed.run , clearml was automatically tracking inexisting command line arguments, which lead to an error on the remote agent.

In case this happens to anyone else, my solution was the following:

valid_args = { action.dest:True for action in get_arg_parser()._actions }
task = Task.init(
            project_name=args.project_name,
            task_name=args.task_name,
            auto_connect_arg_parser={**valid_args, "*": False} # only consider OUR args (not torch.distributed.run's)
        )
  
  
Posted 11 months ago