
Reputation
Badges 1
494 × Eureka!My clearml-server server crashed for some reason, so I won't be able to verify until tomorrow.
It seems to work when I enable conda_freeze
.
Or maybe even better: How can I get all the information of the "INFO" page in the WebUI of a task?
Locally it works fine.
Yes, I did not change this part of the config.
So missing args that are not specified are not None
like intended, but just do not exists in args
. And command is a list instead of a single str.
Args
is similar to what is shown in print(args)
when executed remotely.
With remote_execution it is command="[...]"
, but on local it is command='train'
like it is supposed to be.
Ah, it actually is also a string with remote_execution, but still not what it should be.
And in the WebUI I can see arguments similar to the second print statement's.
Good, at least now I know it is not a user-error 😄
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command
is different and wrong on remote.
That seems to be the case. After parsing the args I run task = Task.init(...)
and then task.execute_remotely(queue_name=args.enqueue, clone=False, exit_process=True)
.
The script is intended to be used something like this:script.py train my_model --steps 10000 --checkpoint-every 10000
orscript.py test my_model --steps 1000
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.
Nvm. I forgot to start my agent with --docker
. So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
@<1576381444509405184:profile|ManiacalLizard2> Just so I understand correctly:
You are saying that in your local, user-specific, clearml.conf you set the api.files_server
, but in your remote, clearml-agent, clearml.conf you left it empty?
I think in the paid version there is this configuration vault, so that the user can pass their own credentials securely to the agent.
However, I cloned the experiment again via the web UI. Then I enqueued it.
No reason in particular. How many people work at http://allegro.ai ?
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client
Maybe there is something wrong with my setup. Conda confuses me sometimes.
I can put anything there: s3://my_minio_instance:9000 /bucket_that_does_not_exist
and it will work.