right now we can pass github secrets to the clearml agent training containers ( CLEARML_AGENT_GIT_PASS) to install private repos
we need a way to pass secrets to access our database with annotations
not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted
will do!
yes. we upload artifacts to Yandex.Cloud S3 using ClearML. we set " s3://storage.yandexcloud.net/clearml-models " as output uri parameter and add this section to the config:{
host: "
http://storage.yandexcloud.net "
key: "KEY"
secret:"SECRET_KEY",
secure: true
}
this works like a charm. but download button in UI is not working
JIC - trains still works after that, it's just that the new user is not added and hence is not able to login
we’re using latest ClearML server and client version (1.2.0)
that's right
for example, there are tasks A, B, C
we run multiple experiments for A, finetune some of them in separate tasks, then choose one or more best checkpoints, run some experiments for task B, choose the best experiment, and finally run task C
so we get a chain of tasks: A - A-ft - B- C
ClearML pipeline doesn't quite work here because we would like to analyze results of each step before starting next task
but it would be great to see predecessors of each experiment in the chain
I guess, this could overcomplicate ui, I don't see a good solution yet.
as a quick hack, we can just use separate name (eg "best_val_roc_auc") for all metric values for the current best checkpoint. then we can just add columns with the last value of this metric
it works, but it's not very helpful since everybody can see a secret in logs:
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'DB_PASSWORD=password']
okay, what do I do if it IS installed?
more like collapse/expand, I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
fantastic, everything is working perfectly
thanks guys
the weird part is that the old job continues running when I recreate the worker and enqueue the new job
thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section
agent.hide_docker_command_env_vars.extra_keys: ["DB_PASSWORD=password"]
like this? or ["DB_PASSWORD", "password"]
we have a baremetal server with ClearML agents, and sometimes there are hanging containers or containers that consume too much RAM. unless I explicitly add container name in container arguments, it will have a random name, which is not very convenient. it would be great if we could set default container name for each experiment (e.g., experiment id)
hard to say, maybe just “related experiments” in experiment info would be enough. I’ll think about it
thanks, this one worked after we changed the package version
sorry, my bad, after some manipulations I made it work. I have to manually change HTTP to HTTPS in config file for Web and Files (not API) server after initialization, but besides that it works
yeah, it works for the new projects and for the old projects that have already had a description
this is how it looks if I zoom in on the epochs that ran before the crash