Hi, I am on clearml == 1.9.0
and I am having the same issue.
Is there a recommended workaround or plans to fix it?
Yes. Here is some simple code that reproduces the issue:
import argparse
from datetime import datetime
import clearml
def train(args):
print(f"Running training: {args}")
def test(args):
print(f"Running testing: {args}")
def parse_args():
parser = argparse.ArgumentParser()
subparser = parser.add_subparsers(dest="subparser")
train_parser = subparser.add_parser("train")
train_parser.add_argument("project", type=str)
train_parser.add_argument("queue", typ...
When the execution starts locally the args
are like:
Namespace(subparser='train', project='test', epochs=0)
then remotely they get converted to:
Namespace(subparser="['train', '--project', 'test', '--epoch', '0']", project='test', epochs=0)
Which is similar to what Tim reported a few messages above.
So when in the code I do something like if args.subparser == "train": ...
I get a normal behaviour locally (i.e. True
), but not remotely because args.subarser
...
Thank you @<1523701205467926528:profile|AgitatedDove14> . I have Task.init()
right at the beginning of the script (i.e. before multiprocessing), but I don’t have the Task.urrent_task()
call, so maybe that would solve the issue. Where should that be? In the function that is parallelised? Or can it also be right after Task.init()
?
Thanks, I’ll try that and report back.
Hi @<1523701205467926528:profile|AgitatedDove14> ,
I can confirm that calling Task.current_task()
makes ClearML log the console, models and scalars again 🙂
I’m not sure if this was solved, but I am encountering a similar issue. From what I see it all depends on what multiprocessing start method is used.
When using fork
ClearML works fine and it’s able to capture everything, however it is not recommended to use fork
as it is not safe with multithreading (e.g. see None ).
With spawn
and forkserver
(which is used in the script above) ClearML is not able to automatically...