I'M On The Machine With Clearml Server Hosted. Is There Any Way To See Datasets Uploaded To Clearml Data Without Downloading Them Using Clearml Data?

Answered

I'm on the machine with ClearML Server hosted. Is there any way to see datasets uploaded to ClearML Data without downloading them using ClearML Data?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Votes Newest

Answers 12

Like there are files in a specific folder on Machine A. A script on Machine A, creates a Dataset, adds files located in that folder, and publishes it. Now can you look at that dataset on the server machine? Not from the ClearML interface but inside normal directories, like in /opt/clearml etc. this directory mentioned is just an example.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Also what's the difference between Finalize vs Publish?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Do you mean see the datasets in the UI?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I'm not quite sure, I'll need to double check 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Also, do I have to manually keep track of dataset versions in a separate database? Or am I provided that as well in ClearML?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

I'm not in the best position to answer these questions right now.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

s there any way to see datasets uploaded to ClearML Data without downloading them using ClearML Data?

Hi VexedCat68
Currently when you create datasets with clearml-data it has to repackage your files, i.e. upload them. That said we have received numerous requests on "registering data", and we are looking into it.
Here is the main technical hurdles we are facing, and I would love to get your perspective:
If the data is not available locally, we cannot calculate the hash of the content, that means there is no verification on the consistency We usually do have a way to get the file size, but in some scenarios this is also not possible The assumption is the data packaged by clearml-data will stay intact (immutable), there is very little guarantee when just "registering links" In terms of interface, if this is "object storage" I think that matching the current interface (i.e. passing a bucket/folder) would make sense, what do you think?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

That but also in proper directory on the File System

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

We want to get a clearer picture here to compare versioning with ClearML Data vs our own custom versioning

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

So I got my answer, for the first one. I found where the data is stored in the server

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Still unsure between finalize and publish? Since upload should upload the data to the server

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Regarding viewing the datasets - Can you give an example? I'm not sure I understand how you'd like to view it

Regarding Publish vs Finalize - I think finalize uploads all the files and prepares it for publish. Once published, it should be accessible to other parts(tasks) in the system

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

2K Views

12 Answers

4 years ago

2 years ago