Thanks @<1527459125401751552:profile|CloudyArcticwolf80> ! let me see if we can reproduce it
Yes. Here is some simple code that reproduces the issue:
import argparse
from datetime import datetime
import clearml
def train(args):
print(f"Running training: {args}")
def test(args):
print(f"Running testing: {args}")
def parse_args():
parser = argparse.ArgumentParser()
subparser = parser.add_subparsers(dest="subparser")
train_parser = subparser.add_parser("train")
train_parser.add_argument("project", type=str)
train_parser.add_argument("queue", type=str)
train_parser.add_argument("--epochs", type=int)
test_parser = subparser.add_parser("test")
test_parser.add_argument("model_path", type=str)
test_parser.add_argument("--metric", type=str)
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_args()
print(args)
task = clearml.Task.init(args.project, datetime.now().strftime("%Y%m%d-%H%M%S"))
task.execute_remotely(queue_name=args.queue)
if args.subparser == "train":
train(args)
elif args.subparser == "test":
test(args)
else:
raise ValueError(f"Invalid args: {args}")
Locally it prints: Namespace(subparser='train', project='my-clearml-project', queue='worker-queue', epochs=2)
In the remote console: Namespace(subparser="['train', 'my-clearml-project', 'worker-queue', '--epochs', '2']", project='my-clearml-project', queue='worker-queue', epochs=2)
and does the code above reproduce the issue/bug? because obviously should not happen
When the execution starts locally the args
are like:
Namespace(subparser='train', project='test', epochs=0)
then remotely they get converted to:
Namespace(subparser="['train', '--project', 'test', '--epoch', '0']", project='test', epochs=0)
Which is similar to what Tim reported a few messages above.
So when in the code I do something like if args.subparser == "train": ...
I get a normal behaviour locally (i.e. True
), but not remotely because args.subarser
is actually that weird string.
I solved it by chaing the condition to if "train" in args.subparser
, which works in both situation, but it’s not very safe 🙂
@<1527459125401751552:profile|CloudyArcticwolf80> what are you seeing in the Args section ?
what exactly is not working ?
Hi, I am on clearml == 1.9.0
and I am having the same issue.
Is there a recommended workaround or plans to fix it?
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.
The script is intended to be used something like this:script.py train my_model --steps 10000 --checkpoint-every 10000
orscript.py test my_model --steps 1000
When I passed specific arguments (for example --steps) it ignored them...
script.py test blah1 blah2 blah3 42
Is this how it is intended to be used ?
Good, at least now I know it is not a user-error 😄
I can verify the behavior, I think it has to do with the way the subparser was setup.
This was the only way for me to get it to run:script.py test blah1 blah2 blah3 42
When I passed specific arguments (for example --steps) it ignored them...
Thanks ReassuredTiger98 , yes that makes sense.
What's the python version you are using ?
And in the WebUI I can see arguments similar to the second print statement's.
Here is some code that shows exactly what goes wrong. I do local execution only. It seems not to be related to remote execution as I thought, but more related to clearml.Task:
` args = parser.parse_args()
print(args) # FIRST OUTPUT
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Task
task = (
Task.init(
project_name="reinforcement-learning/" + type_name,
task_name=args.name or preset_name,
tags=[environment_name],
output_uri=True,
)
if track_remote or enqueue
else None
)
print(task.get_parameters()) # SECOND OUTPUT `
First print(args)
:Namespace(checkpoint=None, checkpoint_every=1000, checkpoint_test_every=1000, command='train', device='cuda', enqueue=None, environment='walker_stand', jit=False, mixed_precision=False, name=None, nvidia_docker=False, preset='rlad.modules.dreamer.presets.dmc.original', render=False, steps=5000000, symbolic_obs=False, test_every=2000, test_steps=1000, track_remote=True, type='dmc')
Second print print(task.get_parameters())
:{'Args/command': "['train', 'rlad.modules.dreamer.presets.dmc.original', 'dmc', 'walker_stand', '5000000', '--test-steps', '1000', '--test-every', '2000', '--checkpoint-test-every', '1000', '--checkpoint-every', '1000', '--track-remote']", 'Args/preset': 'rlad.modules.dreamer.presets.dmc.original', 'Args/type': 'dmc', 'Args/environment': 'walker_stand', 'Args/nvidia_docker': 'False', 'Args/enqueue': '', 'Args/track_remote': 'True', 'Args/device': 'cuda', 'Args/name': '', 'Args/render': 'False', 'Args/checkpoint': '', 'Args/symbolic_obs': 'False', 'Args/mixed_precision': 'False', 'Args/jit': 'False', 'Args/steps': '5000000', 'Args/checkpoint_every': '1000', 'Args/checkpoint_test_every': '1000', 'Args/test_every': '2000', 'Args/test_steps': '1000'}
Just to make sure I understand, running locally creates the Args/command correctly, then when actually executed on the remote machine (i.e. execute_remotely creates the correct Args/command But when the agent actually executes it) it updates back the Args/command as a list. Is that a correct description ?
` args = parser.parse_args()
print(args) # args PRINTED HERE ON LOCAL
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Task
task = (
Task.init(
project_name="reinforcement-learning/" + type_name,
task_name=args.name or preset_name,
tags=[environment_name],
output_uri=True,
)
if track_remote or enqueue
else None
)
# Execute remotly via CLearML
if enqueue is not None:
task.execute_remotely(queue_name=enqueue, clone=False, exit_process=True) `
That seems to be the case. After parsing the args I run task = Task.init(...)
and then task.execute_remotely(queue_name=args.enqueue, clone=False, exit_process=True)
.
if executed remotely...
You mean cloning the local execution, sending to the agent, then when running on the agent the Args/command is updated to a list ?
What I get for args
when I print it locally is not the same as what ClearML extracts from args
.
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command
is different and wrong on remote.
With remote_execution it is
command="[...]"
, but on local it is
command='train'
like it is supposed to be.
I'm not sure I follow, could you expand ?
Ah, it actually is also a string with remote_execution, but still not what it should be.
With remote_execution it is command="[...]"
, but on local it is command='train'
like it is supposed to be.
And command is a list instead of a single str
"command list", you mean the command
argument ?
So missing args that are not specified are not None
like intended, but just do not exists in args
. And command is a list instead of a single str.
Args
is similar to what is shown in print(args)
when executed remotely.
Hi ReassuredTiger98
It's clearml
that needs to support subparser, and it does support it.
What are you seeing in the Args section ?
(Notice that at the end all the args parsing are stored on the global "args" variable after you call the pasre_args(), clearml
will basically take those variables and put them into Args
section)