Just to make sure I understand, running locally creates the Args/command correctly, then when actually executed on the remote machine (i.e. execute_remotely creates the correct Args/command But when the agent actually executes it) it updates back the Args/command as a list. Is that a correct description ?
Ah, it actually is also a string with remote_execution, but still not what it should be.
When I passed specific arguments (for example --steps) it ignored them...
script.py test blah1 blah2 blah3 42
Is this how it is intended to be used ?
With remote_execution it is command="[...]"
, but on local it is command='train'
like it is supposed to be.
Args
is similar to what is shown in print(args)
when executed remotely.
Hi ReassuredTiger98
It's clearml
that needs to support subparser, and it does support it.
What are you seeing in the Args section ?
(Notice that at the end all the args parsing are stored on the global "args" variable after you call the pasre_args(), clearml
will basically take those variables and put them into Args
section)
Thanks @<1527459125401751552:profile|CloudyArcticwolf80> ! let me see if we can reproduce it
With remote_execution it is
command="[...]"
, but on local it is
command='train'
like it is supposed to be.
I'm not sure I follow, could you expand ?
So missing args that are not specified are not None
like intended, but just do not exists in args
. And command is a list instead of a single str.
That seems to be the case. After parsing the args I run task = Task.init(...)
and then task.execute_remotely(queue_name=args.enqueue, clone=False, exit_process=True)
.
The script is intended to be used something like this:script.py train my_model --steps 10000 --checkpoint-every 10000
orscript.py test my_model --steps 1000
And command is a list instead of a single str
"command list", you mean the command
argument ?
Thanks ReassuredTiger98 , yes that makes sense.
What's the python version you are using ?
And in the WebUI I can see arguments similar to the second print statement's.
and does the code above reproduce the issue/bug? because obviously should not happen
Here is some code that shows exactly what goes wrong. I do local execution only. It seems not to be related to remote execution as I thought, but more related to clearml.Task:
` args = parser.parse_args()
print(args) # FIRST OUTPUT
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Task
task = (
Task.init(
project_name="reinforcement-learning/" + type_name,
task_name=args.name or preset_name,
tags=[environment_name],
output_uri=True,
)
if track_remote or enqueue
else None
)
print(task.get_parameters()) # SECOND OUTPUT `
First print(args)
:Namespace(checkpoint=None, checkpoint_every=1000, checkpoint_test_every=1000, command='train', device='cuda', enqueue=None, environment='walker_stand', jit=False, mixed_precision=False, name=None, nvidia_docker=False, preset='rlad.modules.dreamer.presets.dmc.original', render=False, steps=5000000, symbolic_obs=False, test_every=2000, test_steps=1000, track_remote=True, type='dmc')
Second print print(task.get_parameters())
:{'Args/command': "['train', 'rlad.modules.dreamer.presets.dmc.original', 'dmc', 'walker_stand', '5000000', '--test-steps', '1000', '--test-every', '2000', '--checkpoint-test-every', '1000', '--checkpoint-every', '1000', '--track-remote']", 'Args/preset': 'rlad.modules.dreamer.presets.dmc.original', 'Args/type': 'dmc', 'Args/environment': 'walker_stand', 'Args/nvidia_docker': 'False', 'Args/enqueue': '', 'Args/track_remote': 'True', 'Args/device': 'cuda', 'Args/name': '', 'Args/render': 'False', 'Args/checkpoint': '', 'Args/symbolic_obs': 'False', 'Args/mixed_precision': 'False', 'Args/jit': 'False', 'Args/steps': '5000000', 'Args/checkpoint_every': '1000', 'Args/checkpoint_test_every': '1000', 'Args/test_every': '2000', 'Args/test_steps': '1000'}
Good, at least now I know it is not a user-error 😄
What I get for args
when I print it locally is not the same as what ClearML extracts from args
.
Yes. Here is some simple code that reproduces the issue:
import argparse
from datetime import datetime
import clearml
def train(args):
print(f"Running training: {args}")
def test(args):
print(f"Running testing: {args}")
def parse_args():
parser = argparse.ArgumentParser()
subparser = parser.add_subparsers(dest="subparser")
train_parser = subparser.add_parser("train")
train_parser.add_argument("project", type=str)
train_parser.add_argument("queue", type=str)
train_parser.add_argument("--epochs", type=int)
test_parser = subparser.add_parser("test")
test_parser.add_argument("model_path", type=str)
test_parser.add_argument("--metric", type=str)
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_args()
print(args)
task = clearml.Task.init(args.project, datetime.now().strftime("%Y%m%d-%H%M%S"))
task.execute_remotely(queue_name=args.queue)
if args.subparser == "train":
train(args)
elif args.subparser == "test":
test(args)
else:
raise ValueError(f"Invalid args: {args}")
Locally it prints: Namespace(subparser='train', project='my-clearml-project', queue='worker-queue', epochs=2)
In the remote console: Namespace(subparser="['train', 'my-clearml-project', 'worker-queue', '--epochs', '2']", project='my-clearml-project', queue='worker-queue', epochs=2)
if executed remotely...
You mean cloning the local execution, sending to the agent, then when running on the agent the Args/command is updated to a list ?
@<1527459125401751552:profile|CloudyArcticwolf80> what are you seeing in the Args section ?
what exactly is not working ?
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command
is different and wrong on remote.
` args = parser.parse_args()
print(args) # args PRINTED HERE ON LOCAL
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Task
task = (
Task.init(
project_name="reinforcement-learning/" + type_name,
task_name=args.name or preset_name,
tags=[environment_name],
output_uri=True,
)
if track_remote or enqueue
else None
)
# Execute remotly via CLearML
if enqueue is not None:
task.execute_remotely(queue_name=enqueue, clone=False, exit_process=True) `
I can verify the behavior, I think it has to do with the way the subparser was setup.
This was the only way for me to get it to run:script.py test blah1 blah2 blah3 42
When I passed specific arguments (for example --steps) it ignored them...
When the execution starts locally the args
are like:
Namespace(subparser='train', project='test', epochs=0)
then remotely they get converted to:
Namespace(subparser="['train', '--project', 'test', '--epoch', '0']", project='test', epochs=0)
Which is similar to what Tim reported a few messages above.
So when in the code I do something like if args.subparser == "train": ...
I get a normal behaviour locally (i.e. True
), but not remotely because args.subarser
is actually that weird string.
I solved it by chaing the condition to if "train" in args.subparser
, which works in both situation, but it’s not very safe 🙂
Hi, I am on clearml == 1.9.0
and I am having the same issue.
Is there a recommended workaround or plans to fix it?