Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'M Experiencing Some Fairly Slow Uploads Of A New Dataset Version. I'M Running A Local Server And I'M Uploading A ~20Gb Update To A ~30Gb Dataset Consisting Of Few Hundreds Files, Each Up To Several Hundred Mbs. It Seems That Compressing And Upload I

Hi,
I'm experiencing some fairly slow uploads of a new dataset version. I'm running a local server and I'm uploading a ~20GB update to a ~30GB dataset consisting of few hundreds files, each up to several hundred MBs. It seems that compressing and upload itself is quite fast but the fileserver.py process on host is running for up to hour (way longer than when I upload the dataset as a whole as separate dataset). I am wondering if this is expected. Are there some best practices in tuning the compression method or chunk size?
Thanks!

  
  
Posted 9 months ago
Votes Newest

Answers 6


It does feel like the server is struggling since webUI is also having trouble loading debug sample artifacts during the upload. But I'm not sure why that would be the case. The client console is hanging after "uploading dataset changes" and I can see the fileserver.py process putting load on the server cpu but don't see any files being added or changed on the local fileserver folder. Is there a way to check what is the fileserver doing? I don't see anything suspicious in log.

  
  
Posted 9 months ago

Hi @<1547028074090991616:profile|ShaggySwan64> , so the issue is when writing to the files server? Is it possible that the machine itself is having a hard time to write the data?

  
  
Posted 9 months ago

I should probably add that a lot of the update is file modifications...

  
  
Posted 9 months ago

Huh. So it looks like this was an issue of spawning too many upload workers which overwhelmed the fileserver limited to a single core...? When I limited max_workers in upload() on the client side, it went smoothly with no hanging. Funny thing is I had no issues with this using sync_folder() which I used for the original data upload, hence my perceived difference in performance despite similar file sizes.

  
  
Posted 9 months ago

@<1523701070390366208:profile|CostlyOstrich36> I'll be glad for any ideas of what might be happening

  
  
Posted 9 months ago

On the original 30GB dataset, it took just a few seconds to go from uploading the last chunk of data to "File compression and upload completed" so I find it weird that the upload of the update is hanging indefinitely while processing and without utilizing the disk at all.

  
  
Posted 9 months ago
613 Views
6 Answers
9 months ago
9 months ago
Tags