we're using os.getenv in the script to get a value for these secrets
it works, but it's not very helpful since everybody can see a secret in logs:
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '-e', 'DB_PASSWORD=password']
right now we can pass github secrets to the clearml agent training containers ( CLEARML_AGENT_GIT_PASS) to install private repos
we need a way to pass secrets to access our database with annotations
parents and children. maybe tags, maybe separate tab or section, idk. I wonder if anyone else is interested in this functionality, for us this is a very common case
agent.hide_docker_command_env_vars.extra_keys: ["DB_PASSWORD=password"]
like this? or ["DB_PASSWORD", "password"]
tags are somewhat fine for this, I guess, but there will be too many of them eventually, and they do not reflect sequential nature of the experiments
hard to say, maybe just “related experiments” in experiment info would be enough. I’ll think about it
that's right
for example, there are tasks A, B, C
we run multiple experiments for A, finetune some of them in separate tasks, then choose one or more best checkpoints, run some experiments for task B, choose the best experiment, and finally run task C
so we get a chain of tasks: A - A-ft - B- C
ClearML pipeline doesn't quite work here because we would like to analyze results of each step before starting next task
but it would be great to see predecessors of each experiment in the chain
nope, that's the point, quite often we run experiments separately, but they are related to each other. currently there's no way to see that one experiment is using checkpoint from the previous experiment since we need to manually insert S3 link as a hyperparameter. it would be useful to see these connections. maybe instead of grouping we could see which experiments are using artifacts of this experiment
Error 12 : Validation error (value ‘['13b46b9325954517ab99381d5f45237d’, ‘bc76c3a7f0f6431b8e064212e9bdd2c0’, ‘5d2a57cd39b94250b8c8f52303ccef92’, ‘e4731ee5b33e41d992d6d3fdb2913045’, ‘698d9231155e41fbb61f8f3faa605727’, ‘2171b190507f40d1be35e222045c58ea’, ‘55c81a5db0ad40bebf72fdcc1b3be2a4’, ‘94fbdbe26ef242d793e18d955cb3de58’, ‘7d8a6c8f2ae246478b39ae5e87def2ad’, ‘141594c146fe495886d477d9a27c465f’, ‘640f87b02dc94a4098a0aba4d855b8f5’]' length is bigger than allowed maximum ‘10’.)
we often do ablation studies with more than 50 experiments, and it was very convenient to compare their dynamics at the different epochs
fantastic, everything is working perfectly
thanks guys
more like collapse/expand, I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
I'm not sure it's related to the domain switch since we upgraded to the newest ClearML server version at the same time
if you click on the experiment name here, you get 404 because link looks like this:
https://DOMAIN/projects/PROJECT_ID/EXPERIMENT_ID
when it should look like this:
https://DOMAIN/projects/PROJECT_ID/experiments/EXPERIMENT_ID
sorry, my bad, after some manipulations I made it work. I have to manually change HTTP to HTTPS in config file for Web and Files (not API) server after initialization, but besides that it works
I updated S3 credentials, I'll check if they work later
it doesn't explain inability to delete logged images and texts though