Reputation
Badges 1
186 × Eureka!original task name contains double space -> saved checkpoint also contains double space -> MODEL URL field in model description of this checkpoint in ClearML converts double space into single space. so when you copy & paste it somewhere, it'll be incorrect
sounds like an overkill for this problem, but I donβt see any other pretty solution π
I guess, this could overcomplicate ui, I don't see a good solution yet.
as a quick hack, we can just use separate name (eg "best_val_roc_auc") for all metric values for the current best checkpoint. then we can just add columns with the last value of this metric
Requirement already satisfied (use --upgrade to upgrade): celsusutils==0.0.1
it works, but it's not very helpful since everybody can see a secret in logs:
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'DB_PASSWORD=password']
we're using os.getenv in the script to get a value for these secrets
nice, thanks! I'll check if it solves the issue first thing tomorrow in the morning
yeah, server (1.0.0) and client (1.0.1)
nope, the only changes to config that we made are adding web-auth and non-responsive tasks watchdog
just in case, this warning disappeared after I https://stackoverflow.com/questions/49638699/docker-compose-restart-connection-pool-full
self-hosted ClearML server 1.2.0
SDK version 1.1.6
okay, what do I do if it IS installed?
we often do ablation studies with more than 50 experiments, and it was very convenient to compare their dynamics at the different epochs
fantastic, everything is working perfectly
thanks guys
we already have cleanup service set up and running, so we should be good from now on
what if cleanup service is launched using ClearML-Agent Services container (part of the ClearML server)? adding clearml.conf to the home directory doesn't help
two more questions about cleanup if you don't mind:
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them? What is the recommended way of providing S3 credentials to cleanup task?
oh wow, I didn't see delete_artifacts_and_models option
I guess we'll have to manually find old artifacts that are related to already deleted tasks
agent.hide_docker_command_env_vars.extra_keys: ["DB_PASSWORD=password"]
like this? or ["DB_PASSWORD", "password"]
after the very first click, there is a popup with credentials request. nothing happens after that
some of the POST requests "tasks.get_all_ex" fail as far as I can see
yeah, I was thinking mainly about AWS. we use force to make sure we are using the correct latest checkpoint, but this increases costs when we are running a lot of experiments