Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M On The Machine With Clearml Server Hosted. Is There Any Way To See Datasets Uploaded To Clearml Data Without Downloading Them Using Clearml Data?

I'm on the machine with ClearML Server hosted. Is there any way to see datasets uploaded to ClearML Data without downloading them using ClearML Data?

  
  
Posted 3 years ago
Votes Newest

Answers 12


s there any way to see datasets uploaded to ClearML Data without downloading them using ClearML Data?

Hi VexedCat68
Currently when you create datasets with clearml-data it has to repackage your files, i.e. upload them. That said we have received numerous requests on "registering data", and we are looking into it.
Here is the main technical hurdles we are facing, and I would love to get your perspective:
If the data is not available locally, we cannot calculate the hash of the content, that means there is no verification on the consistency We usually do have a way to get the file size, but in some scenarios this is also not possible The assumption is the data packaged by clearml-data will stay intact (immutable), there is very little guarantee when just "registering links" In terms of interface, if this is "object storage" I think that matching the current interface (i.e. passing a bucket/folder) would make sense, what do you think?

  
  
Posted 3 years ago

I'm not quite sure, I'll need to double check 🙂

  
  
Posted 3 years ago

Also, do I have to manually keep track of dataset versions in a separate database? Or am I provided that as well in ClearML?

  
  
Posted 3 years ago

I'm not in the best position to answer these questions right now.

  
  
Posted 3 years ago

That but also in proper directory on the File System

  
  
Posted 3 years ago

Still unsure between finalize and publish? Since upload should upload the data to the server

  
  
Posted 3 years ago

Do you mean see the datasets in the UI?

  
  
Posted 3 years ago

We want to get a clearer picture here to compare versioning with ClearML Data vs our own custom versioning

  
  
Posted 3 years ago

Also what's the difference between Finalize vs Publish?

  
  
Posted 3 years ago

Like there are files in a specific folder on Machine A. A script on Machine A, creates a Dataset, adds files located in that folder, and publishes it. Now can you look at that dataset on the server machine? Not from the ClearML interface but inside normal directories, like in /opt/clearml etc. this directory mentioned is just an example.

  
  
Posted 3 years ago

Regarding viewing the datasets - Can you give an example? I'm not sure I understand how you'd like to view it

Regarding Publish vs Finalize - I think finalize uploads all the files and prepares it for publish. Once published, it should be accessible to other parts(tasks) in the system

  
  
Posted 3 years ago

So I got my answer, for the first one. I found where the data is stored in the server

  
  
Posted 3 years ago
1K Views
12 Answers
3 years ago
one year ago
Tags
Similar posts