Reputation
Badges 1
186 × Eureka!that's right
for example, there are tasks A, B, C
we run multiple experiments for A, finetune some of them in separate tasks, then choose one or more best checkpoints, run some experiments for task B, choose the best experiment, and finally run task C
so we get a chain of tasks: A - A-ft - B- C
ClearML pipeline doesn't quite work here because we would like to analyze results of each step before starting next task
but it would be great to see predecessors of each experiment in the chain
another stupid question - what is the proper way to delete a worker? so far I've been using pgrep to find the relevant PID 😃
I guess, this could overcomplicate ui, I don't see a good solution yet.
as a quick hack, we can just use separate name (eg "best_val_roc_auc") for all metric values for the current best checkpoint. then we can just add columns with the last value of this metric
it works, but it's not very helpful since everybody can see a secret in logs:
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'DB_PASSWORD=password']
just DMed you a screenshot where you can see a part of the token
after the very first click, there is a popup with credentials request. nothing happens after that
yeah, backups take much longer, and we had to increase our EC2 instance volume size twice because of these indices
got it, thanks, will try to delete older ones
okay, what do I do if it IS installed?
more like collapse/expand, I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
fantastic, everything is working perfectly
thanks guys
the weird part is that the old job continues running when I recreate the worker and enqueue the new job
thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section
agent.hide_docker_command_env_vars.extra_keys: ["DB_PASSWORD=password"]
like this? or ["DB_PASSWORD", "password"]
we have a baremetal server with ClearML agents, and sometimes there are hanging containers or containers that consume too much RAM. unless I explicitly add container name in container arguments, it will have a random name, which is not very convenient. it would be great if we could set default container name for each experiment (e.g., experiment id)
hard to say, maybe just “related experiments” in experiment info would be enough. I’ll think about it
thanks, this one worked after we changed the package version
sorry, my bad, after some manipulations I made it work. I have to manually change HTTP to HTTPS in config file for Web and Files (not API) server after initialization, but besides that it works
yeah, it works for the new projects and for the old projects that have already had a description
it will probably screw up my resource monitoring plots, but well, who cares 😃
this is how it looks if I zoom in on the epochs that ran before the crash
this would be great. I could just then pass it as a hyperparameter