Badges 1494 × Eureka!
My clearml-server server crashed for some reason, so I won't be able to verify until tomorrow.
It seems to work when I enable
Or maybe even better: How can I get all the information of the "INFO" page in the WebUI of a task?
Locally it works fine.
Yes, I did not change this part of the config.
So missing args that are not specified are not
None like intended, but just do not exists in
args . And command is a list instead of a single str.
Args is similar to what is shown in
print(args) when executed remotely.
With remote_execution it is
command="[...]" , but on local it is
command='train' like it is supposed to be.
Ah, it actually is also a string with remote_execution, but still not what it should be.
And in the WebUI I can see arguments similar to the second print statement's.
Good, at least now I know it is not a user-error 😄
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like
command is different and wrong on remote.
That seems to be the case. After parsing the args I run
task = Task.init(...) and then
task.execute_remotely(queue_name=args.enqueue, clone=False, exit_process=True) .
The script is intended to be used something like this:
script.py train my_model --steps 10000 --checkpoint-every 10000
script.py test my_model --steps 1000
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.
Nvm. I forgot to start my agent with
--docker . So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
@<1576381444509405184:profile|ManiacalLizard2> Just so I understand correctly:
You are saying that in your local, user-specific, clearml.conf you set the
api.files_server , but in your remote, clearml-agent, clearml.conf you left it empty?
I think in the paid version there is this configuration vault, so that the user can pass their own credentials securely to the agent.
However, I cloned the experiment again via the web UI. Then I enqueued it.
No reason in particular. How many people work at http://allegro.ai ?
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
So I have to specify it on every clearml-agent in the respective clearml.conf?
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client
Maybe there is something wrong with my setup. Conda confuses me sometimes.