Hi Community :) I'M New To Clearml And Seeking Advice On Best Practices For Managing Datasets. I Have Two Types Of Datasets: (1) Pdfs (2) Tabular Data Stored In Excel.

Answered

Hi community :)
I'm new to ClearML and seeking advice on best practices for managing datasets. I have two types of datasets:
(1) PDFs
(2) Tabular data stored in Excel.

Question 1: Tracking changes in different versions of Excel files
I frequently update my Excel datasets by adding new data and deleting old entries. Can I track these changes across different versions in ClearML?

For instance, if I upload an initial Excel file and later make modifications, is there a way to compare the versions to see what data was added or removed?

Question 2: Handling data stored in S3 without storing it in ClearML
I have a bucket in S3 that stores PDFs, and I prefer not to store these files directly in ClearML. Is there a way to track changes to the files in this S3 bucket, such as monitoring which files have been added or removed?

Thanks!

  				
Posted 
	7 months ago

					More  		
  Report
		
					GloriousKoala29
				
					0
					 × 1

Votes Newest

Answers 2

Hi GloriousKoala29 , to address your questions:

No, that is not possible currently. Think of the Datasets feature as a catalogue of data, meaning you can see what data is saved but you can only see what's inside when you pull it locally.
I'm afraid not, ClearML basically saves links to the data but doesn't directly "look" at the data

  				
Posted 
	7 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

I see, thanks for that!

  				
Posted 
	7 months ago

					More  		
  Report
		
					GloriousKoala29
				
					0
					 × 1

Write your answer

602 Views

2 Answers

7 months ago

Hi, I Have A Question Regarding Clearml Datasets. In The Web Ui, What Causes The "Content" Tab To Show A List Of The Files In The Dataset? It Used To Show Automatically, But Recently It Now Has "No Data To Show" Even Though All Files Are Definitely In The

Hi, I Have A Problem About Using Clearml-Serving. The Tutorial Shows That We Can Only Send Bytes Or Dict Data To The Preprocess Method In The Object Called Preprocess, But I Want To Know That Is It Allow Us To Send Files Like Csv... Etc?

How Can I Access The Commit And Uncommitted Changes Information Displayed On The Webapp On The Execution Tab Of A Task? I Don'T See Corresponding Functions In The Sdk To Get That Data.

Hi! Is There Any Way To Add Git-Like Ignore File For Versioning Clearml Data? I Saw In Docs A Wildcard Argument When Files Are Added To A Dataset. How Can I Specify Ignoring Of Some File Types? For Example, I Want To Ignore Ipynb Checkpoints. How Can I Do

I Wonder If There Is Any Way To Sync Data On Cloud Storage (E.G S3 Or Minio) Using Clearml-Data Sync Command? Because I Don’T See Any Place In The Documentation Mention It, We Can Upload From Local Files But Not The Files On Cloud

Hello Guys I Have A Question About Local Cache Right Now Im Trying To Store In Cache A Pretty Large Dataset (Over 20Mil Files And 3Tb Of Data) I Use A

In My Current Project I Generate The Data From An Sql Query. Is The Only Way To Register The Dataset With Clearml To Write The Files To Disk First Or Is There Another Method? This Leads Into The Second Issue I Have, Which Is What Happens When I Store The

Hi, If I Work With Excel Files And I Add New Features In It How Can Clearml Can Help Me Track The Features? How Should I Store My Samples (500,000) To Maximize The Benefit?

Hi, Very New To Clearml And This Concept In Overall. Does Clearml-Data Track Feature In Excel File? I.E. Can It Give Me An Evoution Of The Excel File? Thanks