Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Which Database Services Are Used To Store The Logged Data Such As Scalar, Text, Matrix, Etc? How Can I Query These For A Downstream Process Programmatically Instead Of Just Within The Web Ui? If Scalar Data Is Stored In Mongodb, Can I Use Pymongo To R

Hi, which database services are used to store the logged data such as scalar, text, matrix, etc? How can I query these for a downstream process programmatically instead of just within the web UI? If scalar data is stored in mongoDB, can I use pymongo to retrieve it?

  
  
Posted 3 years ago
Votes Newest

Answers 10


Got it. That makes sense. Thanks!

  
  
Posted 3 years ago

Hi SarcasticSparrow10

which database services are used to...

Mongo & Elastic
You can query everything using ClearML interface, or talk directly with the databases.
Full RestAPI is here:
https://clear.ml/docs/latest/docs/references/api/endpoints
You can use the APIClient for easier pythonic interface:
See example here
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py

What is the exact use case you have in mind?

  
  
Posted 3 years ago

For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.

You can use artifacts for complex data then retrieve them programatically.
Or you can manually report scalers / plots etc, with Logger class, also you can retrive them with task.get_last_scalar_metrics

I see that you guys have made a lot of progress in the last two months! I'm excited to dig inΒ 

Thank you!

You can further dig with Task.get_tasks to get / filter / sort tasks based on any metric reported.

  
  
Posted 3 years ago

Ok, I will look into artifacts. However, I will probably need high performance query functionality. For example, say I have a model and hundreds of thousands of inference records for that model. I want to be able to efficiently query that. My guess is that wouldn't be possible with artifacts. But that should be possible with Task.get_tasks .

  
  
Posted 3 years ago

What would be the query ? Are you reporting 100+ diff scalars ?

At the moment I am not reporting any scalars related to inference. I'm only reporting data related to training a model. But I would like to report records that result from an inference process. For example the record would contain key_1, key_2, datetime, pred_1, pred_2 ... pred_n. I would have about 20 scalars if each of these fields is reported as a scalar.

The query can be a simple filtering criteria matching some keys or a more complex one which requires aggregation.

  
  
Posted 3 years ago

Hi AgitatedDove14 Thanks, I'll check these out.

What is the exact use case you have in mind?

I want to store some additional data that is not relevant to training a model. For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.

Btw, I had been busy with another project and hadn't logged in here for some time. I see that you guys have made a lot of progress in the last two months! I'm excited to dig in πŸ™‚

  
  
Posted 3 years ago

πŸ‘

  
  
Posted 3 years ago

Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB

  
  
Posted 3 years ago

Also it might be better (although not necessary) to have a separate collection for storing inference results for better organization.

  
  
Posted 3 years ago

I have a model and hundreds of thousands of inference records for that model.

What would be the query ? Are you reporting 100+ diff scalars ?

  
  
Posted 3 years ago
907 Views
10 Answers
3 years ago
one year ago
Tags