I Wanted To Ask, I'M Versioning My Data Using Clearml Data. And I'Ll Have A Training Task With Clearml Task. My Question Is, Does Clearml Keep Track Of The Data Versions Fetched From Clearml Data? Basically I Want To See How Much Of Tracking And Informati

Answered

I wanted to ask, I'm versioning my data using ClearML Data. And I'll have a training task with ClearML Task.
My question is, does ClearML keep track of the Data Versions fetched from ClearML Data?
Basically I want to see how much of tracking and information storing is done by ClearML Directly and how much will I have to manually do using a database

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Votes Newest

Answers 11

I'm not sure about auto logging, since you might be using different datasets or you might get a dataset but might not use it based on specific conditions. However as a developer choosing to use such as ClearML who considers it more of an ecosystem instead of just a continuous training pipeline, I would want as many aspects of the MLOPS process and the information around the experiment to be able to be logged within the bounds of ClearML without having to use external databases or libraries.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

That is true. If I'm understanding correctly, by configuration parameters, you mean using arg parse right?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Basically trying to keep track of how much of the tracking and record keeping is done by ClearML for me? And what things do I need to keep a track of manually in a database.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

VexedCat68 , do you mean does it track which version was fetched or does it track everytime a version is fetched?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Understood

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

VexedCat68 actually a few users already suggested we auto log the dataset ID used as an additional configuration section, wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It does to me. However I'm proposing a situation where a user gets N number of Datasets using Dataset.get, but uses m number of datasets for training where m < n. Would it make sense to only log the m datasets that were used for training? How would that be done?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

VexedCat68 , that's a good question! I'm not sure that ClearML keeps track of that, I need to check on that.

However, I think a neat solution could be using the datasets as task configuration parameters. This way you can track which datasets were used and you can set up new runs with different datasets.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Let me try to be a bit more clear.

If I have a training task in which I'm getting multiple ClearML Datasets from multiple ClearML IDs. I get local copies, train the model, save the model, and delete the local copy in that script.

Does ClearML keep track of which data versions were gotten and used from ClearML Data?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

VexedCat68 , correct. But not only arg parse. The entire configuration section 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

1K Views

11 Answers

3 years ago

one year ago