Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, All I Have Some Issues Uploading A Big (100Gb) Dataset To Self-Hosted Clearml Server. Is There Any Tricks I Should Be Aware When Launching The Server? Maybe Configuring Timeout Or Giving More Resources? Right Now The Upload Freezes And In The Web-Int

Hi, all

I have some issues uploading a big (100gb) dataset to self-hosted clearml server. Is there any tricks I should be aware when launching the server? Maybe configuring timeout or giving more resources? Right now the upload freezes and in the web-interface the dataset is marked as Aborted with status message in info tab Forced stop (non-responsive) . I created issue on github . before thinking of slack. Let me know if I should delete it.

Kind regards

  
  
Posted 10 months ago
Votes Newest

Answers 6


I am not sure about that. I have another dataset of similar structure which is smaller (40gb) and which succeeded to be uploaded. Seems like the how it works - first it computes sha for all the files, but during uploading - aggregates small files in to zip archives approx 512 mb each.

  
  
Posted 10 months ago

Hi @<1547390422483996672:profile|StaleElk72> , are you getting an error at any point? This is indeed a large file, and I assume you're uploading it to eh ClearML fileserver, and not to some object storage like S3?

  
  
Posted 10 months ago

In that case I assume this is just a series of a lot of small (?) uploads which take a lot of time

  
  
Posted 10 months ago

image

  
  
Posted 10 months ago

its a directory (sha generation step actually successfull:

Generating SHA2 hash for 1136604 files

as in github issue). given previous experience, i would expect it to be uploaded as multiple zip files.

yes, I dont use s3. i have a dedicated machine with raid configured, were clearml server is running.

  
  
Posted 10 months ago

I end up using dvc for the dataset management. It doesnt have fancy UI, but works flawlessly with large datasets

  
  
Posted 10 months ago
595 Views
6 Answers
10 months ago
10 months ago
Tags