Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Bug Report. I Was Trying To Upload Data To S3 Via Clearml.Dataset Interface

Hi, bug report. I was trying to upload data to S3 via clearml.Dataset interface
def finalize(dataset: Dataset, hyperparams: HyperParams) -> None: dataset.add_files(f"{DATA_DIR}", wildcard="*.parquet", verbose=hyperparams.verbose) dataset.upload(verbose=hyperparams.verbose) dataset.finalize(verbose=hyperparams.verbose) Task.current_task().mark_completed() The logs showed me that everything was uploaded (3 chunks of data)
... 2022-07-04 10:50:29,771 - clearml.storage - INFO - Uploading: 320.25MB / 321.25MB @ 208.26MBs from /tmp/dataset.zip File compression and upload completed: total size 1.31 GB, 3 chunked stored (average size 438.24 MB) Updating statistics and genealogy 2022-07-04 13:50:46 Process completed successfullyHowever, there were only 2 chunks in my artifacts (data_001, data_002).

It is not a constant thing, so I don’t know was it an issue of AWS or something wrong happened inside ClearML. Maybe some checksum should be applied to verify the upload status.

  
  
Posted one year ago
Votes Newest

Answers 7


Hi NonchalantGiraffe17 ! Thanks for reporting this. It would be easier for us to check if there is something wrong with ClearML if we knew the number and sizes of the files you are trying to upload (content is not relevant). Could you maybe provide those?

  
  
Posted one year ago

Hi,
It would be great if you could also send your clearml package version πŸ™‚

  
  
Posted one year ago

SmugDolphin23 As I said there were 3 chunks of data (up to 483 MB each). Here the one that was lost
chats/dispute_gc_chats_133400961_137507205.parquet - 51.93 MB chats/dispute_gc_chats_170882671_174999873.parquet - 98.5 MB trades/dispute_gc_trades_143439217_146742448.parquet - 122.29 MB trades/dispute_gc_trades_133400961_137507205.parquet - 145.08 MB chats/dispute_gc_chats_146742448_150159762.parquet - 52.49 MB chats/dispute_gc_chats_160661300_164213405.parquet - 95.77 MB trades/dispute_gc_trades_153691420_157306621.parquet - 119.58 MB trades/dispute_gc_trades_184071714_188636116.parquet - 142.56 MB chats/dispute_gc_chats_150159762_153691420.parquet - 55.52 MB chats/dispute_gc_chats_157306621_160661300.parquet - 95.21 MB trades/dispute_gc_trades_150159762_153691420.parquet - 119.65 MB trades/dispute_gc_trades_160661300_164213405.parquet - 142.66 MB

  
  
Posted one year ago

Perfect! Can you please provide the sizes of the files of the other 2 chunks as well?

  
  
Posted one year ago

The previously mentioned chunk is from different task. Here the 2 pieces that was successfully uploaded:

data_001
chats/dispute_gc_chats_127600000_133400961.parquet - 54.36 MB trades/dispute_gc_trades_170882671_174999873.parquet - 76.4 MB chats/dispute_gc_chats_140483651_143439217.parquet - 47.43 MB trades/dispute_gc_trades_137507205_140483651.parquet - 70.56 MB chats/dispute_gc_chats_184071714_188602761.parquet - 47.98 MB trades/dispute_gc_trades_140483651_143439217.parquet - 71.04 MB chats/dispute_gc_chats_153691420_157306621.parquet - 49.93 MB trades/dispute_gc_trades_184071714_188602761.parquet - 72.76 MB chats/dispute_gc_chats_174999873_179485776.parquet - 49.73 MB trades/dispute_gc_trades_174999873_179485776.parquet - 72.82 MB chats/dispute_gc_chats_143439217_146742448.parquet - 50.94 MB trades/dispute_gc_trades_133400961_137507205.parquet - 73.72 MBdata_002
chats/dispute_gc_chats_146742448_150159762.parquet - 52.49 MB trades/dispute_gc_trades_164213405_167495033.parquet - 74.72 MB chats/dispute_gc_chats_137507205_140483651.parquet - 51.54 MB trades/dispute_gc_trades_157306621_160661300.parquet - 73.8 MB chats/dispute_gc_chats_133400961_137507205.parquet - 51.93 MB trades/dispute_gc_trades_127600000_133400961.parquet - 74.02 MB chats/dispute_gc_chats_179485776_184071714.parquet - 51.33 MB trades/dispute_gc_trades_160661300_164213405.parquet - 74.34 MB

  
  
Posted one year ago

thanks ! we have added quite a lot of new features on datasets on our last releases. I would encourage you to update your clearml packages πŸ™‚

  
  
Posted one year ago

SweetBadger76 clearml==1.5.0
WebApp: 1.5.0-192 Server: 1.5.0-192 API: 2.18

  
  
Posted one year ago
588 Views
7 Answers
one year ago
one year ago
Tags