Hi, Which Database Services Are Used To Store The Logged Data Such As Scalar, Text, Matrix, Etc? How Can I Query These For A Downstream Process Programmatically Instead Of Just Within The Web Ui? If Scalar Data Is Stored In Mongodb, Can I Use Pymongo To R

Answered

Hi, which database services are used to store the logged data such as scalar, text, matrix, etc? How can I query these for a downstream process programmatically instead of just within the web UI? If scalar data is stored in mongoDB, can I use pymongo to retrieve it?

  				
Posted 
	3 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Votes Newest

Answers 10

Also it might be better (although not necessary) to have a separate collection for storing inference results for better organization.

  				
Posted 
	3 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Hi AgitatedDove14 Thanks, I'll check these out.

What is the exact use case you have in mind?

I want to store some additional data that is not relevant to training a model. For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.

Btw, I had been busy with another project and hadn't logged in here for some time. I see that you guys have made a lot of progress in the last two months! I'm excited to dig in 🙂

  				
Posted 
	3 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Hi SarcasticSparrow10

which database services are used to...

Mongo & Elastic
You can query everything using ClearML interface, or talk directly with the databases.
Full RestAPI is here:
https://clear.ml/docs/latest/docs/references/api/endpoints
You can use the APIClient for easier pythonic interface:
See example here
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py

What is the exact use case you have in mind?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

👍

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I have a model and hundreds of thousands of inference records for that model.

What would be the query ? Are you reporting 100+ diff scalars ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

What would be the query ? Are you reporting 100+ diff scalars ?

At the moment I am not reporting any scalars related to inference. I'm only reporting data related to training a model. But I would like to report records that result from an inference process. For example the record would contain key_1, key_2, datetime, pred_1, pred_2 ... pred_n. I would have about 20 scalars if each of these fields is reported as a scalar.

The query can be a simple filtering criteria matching some keys or a more complex one which requires aggregation.

  				
Posted 
	3 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Got it. That makes sense. Thanks!

  				
Posted 
	3 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

Ok, I will look into artifacts. However, I will probably need high performance query functionality. For example, say I have a model and hundreds of thousands of inference records for that model. I want to be able to efficiently query that. My guess is that wouldn't be possible with artifacts. But that should be possible with Task.get_tasks .

  				
Posted 
	3 years ago

					More  		
  Report
		
					SarcasticSparrow10
				
					0
					 × 1

For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.

You can use artifacts for complex data then retrieve them programatically.
Or you can manually report scalers / plots etc, with Logger class, also you can retrive them with task.get_last_scalar_metrics

I see that you guys have made a lot of progress in the last two months! I'm excited to dig in

Thank you!

You can further dig with Task.get_tasks to get / filter / sort tasks based on any metric reported.

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

10 Answers

3 years ago

2 years ago