Reputation
Badges 1
22 × Eureka! model_checkpoint = ModelCheckpoint(
"checkpoint",
n_saved=2,
filename_prefix="best",
score_function=score_function,
score_name="accuracy",
global_step_transform=global_step_from_engine(trainer),
)
# Save the model after every epoch of val_evaluator is completed
val_evaluator.add_event_handler(
Events.COMPLETED, model_checkpoint, {"model": model}
)
Kind ping on this thread, thanks! 🙂
No worries, sorry for pinging, was just making sure you (or anyone else who might help) doesn't miss it 🙂
I use Task.add_requirements("requirements.txt") right before the Task.init.
In main, I parse arguments command-line, add_requirements, initialize Task and call execute_remotely. After that it's all pretty much the usual workflow. Initialize the model, setup dataloaders, optimizer and run the training. I'm using pytorch-ignite and have model checkpoint made on validation evaluator COMPL...
@<1523701087100473344:profile|SuccessfulKoala55> Kind reminder again, thanks and sorry!
Having a bit of trouble with this one (sorry for possibly dumb questions).
Are there any docs on how to add certs to the docker image? I see this ( None ) which is where letsencrypt points me to, but I'm not sure what's the proper way to do this on the webapp docker (I'd assume there's a non-hacky way to do it as others are using the same setup I'm trying to make work I guess)
I just added the secrets/keys to docker-compose.yml and restarted everything but no change.
Probably not, I'm trying to access it via external IP. Could you point me to instructions for that in the docs, I don't remember seeing it anywhere? Thanks!
To make sure I understand, I need to setup a domain with a cert and it should work, no additional ClearML config is required?
@<1523701087100473344:profile|SuccessfulKoala55> kind reminder not to miss this when you catch time, thanks!
One more related question (I hope there's a similar solution), when I log images, they appear in the UI with http://<my-ip> so they are inaccessible (they should be translated to None . Is there any path_substitution variant for this scenario in the config? I can't seem to find it in the docs. Thanks!
Additional info:
-Public URL uses HTTPS, internal traffic doesn't.
-clearml.storage fails while trying to fetch None ...
Meaning it just replaced the internal IP with the URL at some point for some reason, it doesn't exist in that form anywhere in any configs (http and public URL).
Doesn't work unfortunately 😕 Thanks either way!
clearml-1.13.1
Task.add_requirements("requirements.txt")
task = Task.init(project_name="My project", task_name="My task")
task.execute_remotely(queue_name="default")
...
I hacked around the solution by setting api.files_server for the agent to the public URL, but ideally I'd avoid going through reverse-proxy if there's some path_substitution equivalent for this. Thanks
Not ClearML employee (just a recent user), but maybe this will help? None
Perfect, exactly what I needed, thanks!
Tried but it didn't help. I suspect the issue is here: "'docker', 'run', '-t', '--gpus', '"device=0"', '-v', '/tmp/ssh-krPvUxRks5/agent.1949:/tmp/ssh-krPvUxRks5/agent.1949', '-e', 'SSH_AUTH_SOCK=/tmp/ssh-krPvUxRks5/agent.1949'"
It passes SSH socket instead of .ssh directory (not sure why, an agent I have running on my own machine behaves differently)? Do you happen to know how to fix this? Thanks!
Added -v /home/uname/.ssh:/root/.ssh and it resolved the issue. I assume this is some sort of a bug then?
Yes SSH_AUTH_SOCK is defined on the host. Should I manually add SSH mounting then through "extra flags"?
Oh, I misunderstood then docs/examples, sorry. I'm using pytorch-ignite.
Thanks for the tip!