Hi SarcasticSparrow10
which database services are used to...
Mongo & Elastic
You can query everything using ClearML interface, or talk directly with the databases.
Full RestAPI is here:
https://clear.ml/docs/latest/docs/references/api/endpoints
You can use the APIClient for easier pythonic interface:
See example here
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py
What is the exact use case you have in mind?
Hi AgitatedDove14 Thanks, I'll check these out.
What is the exact use case you have in mind?
I want to store some additional data that is not relevant to training a model. For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.
Btw, I had been busy with another project and hadn't logged in here for some time. I see that you guys have made a lot of progress in the last two months! I'm excited to dig in π
For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.
You can use artifacts for complex data then retrieve them programatically.
Or you can manually report scalers / plots etc, with Logger
class, also you can retrive them with task.get_last_scalar_metrics
I see that you guys have made a lot of progress in the last two months! I'm excited to dig inΒ
Thank you!
You can further dig with Task.get_tasks
to get / filter / sort tasks based on any metric reported.
Ok, I will look into artifacts. However, I will probably need high performance query functionality. For example, say I have a model and hundreds of thousands of inference records for that model. I want to be able to efficiently query that. My guess is that wouldn't be possible with artifacts. But that should be possible with Task.get_tasks
.
Also it might be better (although not necessary) to have a separate collection for storing inference results for better organization.
I have a model and hundreds of thousands of inference records for that model.
What would be the query ? Are you reporting 100+ diff scalars ?
What would be the query ? Are you reporting 100+ diff scalars ?
At the moment I am not reporting any scalars related to inference. I'm only reporting data related to training a model. But I would like to report records that result from an inference process. For example the record would contain key_1, key_2, datetime, pred_1, pred_2 ... pred_n. I would have about 20 scalars if each of these fields is reported as a scalar.
The query can be a simple filtering criteria matching some keys or a more complex one which requires aggregation.
Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB