Reputation
Badges 1
45 × Eureka!One more related question (I hope there's a similar solution), when I log images, they appear in the UI with http://<my-ip> so they are inaccessible (they should be translated to None . Is there any path_substitution variant for this scenario in the config? I can't seem to find it in the docs. Thanks!
Neither, metric is a number you report through the Logger:
@<1523701087100473344:profile|SuccessfulKoala55> Kind reminder again, thanks and sorry!
I hacked around the solution by setting api.files_server for the agent to the public URL, but ideally I'd avoid going through reverse-proxy if there's some path_substitution equivalent for this. Thanks
Oh, I misunderstood then docs/examples, sorry. I'm using pytorch-ignite.
Thanks for the tip!
Not ClearML employee (just a recent user), but maybe this will help? None
model_checkpoint = ModelCheckpoint(
"checkpoint",
n_saved=2,
filename_prefix="best",
score_function=score_function,
score_name="accuracy",
global_step_transform=global_step_from_engine(trainer),
)
# Save the model after every epoch of val_evaluator is completed
val_evaluator.add_event_handler(
Events.COMPLETED, model_checkpoint, {"model": model}
)
Tried but it didn't help. I suspect the issue is here: "'docker', 'run', '-t', '--gpus', '"device=0"', '-v', '/tmp/ssh-krPvUxRks5/agent.1949:/tmp/ssh-krPvUxRks5/agent.1949', '-e', 'SSH_AUTH_SOCK=/tmp/ssh-krPvUxRks5/agent.1949'"
It passes SSH socket instead of .ssh directory (not sure why, an agent I have running on my own machine behaves differently)? Do you happen to know how to fix this? Thanks!
Weird. When I spawn agent with sudo I get this behaviour. Without sudo everything works fine
Perfect, exactly what I needed, thanks!
@<1523701087100473344:profile|SuccessfulKoala55> kind reminder not to miss this when you catch time, thanks!
The issue was .ssh wasn't propagated so the git repository couldn't be cloned.
It seems that task.set_base_docker must be called with docker_image as well (otherwise docker_arguments don't propagate), not sure if it's a bug or not, but I have a workaround now, thanks!
Yeah, I'm starting to lean towards enterprise solution more and more 😁
Thanks!
I know about clearml.conf but wanted to avoid ssh-ing through 50 instances to edit it.
task.set_base_docker does the job, but docker_arguments doesn't propagate if I leave docker_image as None (it just uses both image and arguments from clearml.conf of the agent). If I explicitly state docker_image and docker_arguments in task.set_base_docker it works fine.
Failed to initialize NVML: Unknown Error