Reputation
Badges 1
25 × Eureka!Hi MelancholyChicken65
I'm assuming you need ssh protocol not https user/token, set this one to true šforce_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39
Hi AstonishingRabbit13
is there option to omit the task_id so the final output will be deterministic and know prior to the task run?
Actually no š the full path is unique for the run, so you do not end up overwriting models.
You can get the full path from the UI (Models Tab) or programmatically with Models.query_models or using the Task.get_task methods.
What's the idea behind a fixed location for the model?
AstonishingRabbit13 so is it working now ?
Hi AstonishingRabbit13
now Iām training yolov5 and i want to save all the info (model and metrics ) with clearml to my bucket..
The easiest thing (assuming you are running YOLOv5 with python train.py
is to add the following env variable:CLEARML_DEFAULT_OUTPUT_URI="
" python train.py
Notice that you need to pass your GS credentials here:
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
the only thing that missing is some plots on the clearml server (app ) when i got to the details of the train i cannot see the matrix confusion for example ( but its exists on the bucket )
How do you report the "matrix confusion" ? (I might have an idea on what's the difference)
its should logged all in the end as I understand
Hmm let me check the code for a minute
Oh that makes sense.
So now you can just get the models as dict as well (basically clearml allows you to access them both as a list, so it is easy to get the last created, and as dict so you can match the filenames)
This one will get the list of modelsprint(task.models["output"].keys())
Now you can just pick the best onemodel = task.models["output"]["epoch13-..."] my_model_file = model.get_local_copy()
Hi MistakenDragonfly51
Notice that Models are their own entity, you can query them based on tags/projects/names etc.
Querying and getting Models is done by Model class:
https://clear.ml/docs/latest/docs/references/sdk/model_model#modelquery_models
task.get_models()
is always empty. (edited)
How come there are no Models on the Task? (in other words how come this is empty?)
Wait, that makes no sense to me. The API from python and the API from the UI are getting the same data from the backend ...
What are you getting with?from clearml import Task task = Task.get_task(task_id=<put task id here>) print(task.models)
Fixed in pip install clearml==1.8.1rc0
š
Well it seems we forgot that one š I'll quickly make sure it is there.
As a quick solution (no need to upgrade)task.models["output"]._models.keys()
That might be me, let me check...
CheerfulGorilla72 as I understand there were some delays wit the current release, so it is going to be out this week. The one after that includes this feature and as far as I understand would be mid Dec.
Hi CheerfulGorilla72
is it ideological...
Lol, no š
Since some of the comparisons are done client side (browser, mostly the text comparisons) it is a bit heavy , so we added a limit. We want to change it so it does some on the backend, but in the meantime we can actually expand the limit, and maybe only lazy compare the text areas. Hopefully in the next version š¤
Hi Guys,
I hear you guys, and I know this is planned but probably bump down priority.
I know the main issue is the "Execution Tab" comparison, the rest is not an issue.
Maybe a quick Hack to only compare the first 10 in the Execution, and remove the limit on the others ? (The main isue with the execution is the git-diff / installed packages comparison that is quite taxing on the FE)
Thoughts ?
I want to be able to compare scalars of more than 10 experiments, otherwise there is no strong need yet
Make sense, in the next version, not the one that will be released next week, the one after with reports (shhh do tell anyone š ) , they tell me this is solved š
- ...that file and the logs of the agent service always say the same thing as before:
Oh in that case you need feel in Your credentials here:
https://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L137
Basically CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY will let the agent running inside the docker talk to the server itself. Just put your own credentials there as a start, it should solve the issue
execution_queue
is not relevent anymore
Correct
total_max_jobs
is determined by how many machine I launch the script
Actually this is the number of concurrent subprocesses that are launched on Your machine. Notice that local execution means all experiments are launched on the machine that started the HPO process.
Maybe to clarify, I was looking for something with the more classic Ask-and-Tell interface
so the way to connect "ask" in the model, is to just...
if so is there any doc/examples about this?
Good point, passing to docs š
https://github.com/allegroai/clearml/blob/51af6e833ddc5a8ba1efaaf75980f58616b25e85/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L123
I mean it is mentioned, but we should highlight it better
Hi MistakenDragonfly51
Is it possible to use it without using the clearml agent system?
Yes it is, which would mean everything is executed locally
basically:an_optimizer.start_locally()
instead of this line
https://github.com/allegroai/clearml/blob/51af6e833ddc5a8ba1efaaf75980f58616b25e85/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L121
it's saved in a
lightning_logs
folder where i started the script instead.
It should be saved there + it should upload it to your file server
Can you send the Task log? (this is odd)
Hi MistakenDragonfly51
I'm trying to set
default_output_uri
in
This should be set wither on your client side, or on the worker machine (running the clearml-agent).
Make sense ?
yey working š
I see, let me check the code and get back to you, this seems indeed like an issue with the Triton configuration in the model monitoring scenario.
Hmm is "model_monitoring_eps" another version of the model and it does not have all the properties of the "original" one?
MelancholyChicken65 found it ! thank you for finding this issue.
I'm hoping to get an update soon š
MelancholyChicken65 what's the clearml-serving you are using ? (I believe this issue was fixed in 1.2)
so other process can use it
This is why there is a model repository, so you can query the last model created, or by name or tag or query the Task that created it and then via the Task the model and it's location.
This is a stable way to make sure your application code (the one using the model) will get to use stable models regardless of the training processes.
I would add a Tag to the model and then search based on the project and the tag, wdyt?
i would like to have it also save on the bucket
oh if this is the casse, you can just change the clearml file server to point to GS bucket, everything will be stored there.
Just change your clearml.conf:files_server: "
"
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L10
is there something else in the conf that i should change ?
I'm assuming the google credentials?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113