Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello There Again! So, I Discovered By Accident (As It Usually Happens) That Apparently Clearml Uses

Hello there again!

So, I discovered by accident (as it usually happens) that apparently ClearML uses pickle as a backup serialization method if serializing with json fails, BUT ONLY during upload.
When a dataset whose state was pickled instead of jsonified is retrieved, ClearML doesn't attempt to use pickle if deserializing fails with json .

Now, ideally maybe it shouldn't try to revert to pickle if json fails and just fail completely instead. Or, of course, could also just add pickle as a backup deserializer if json fails to load the state.

  
  
Posted 4 months ago
Votes Newest

Answers 2


To be clear this mostly occurred because of probably slightly unintended use of clearml, but if you remember, previously I had trouble adding external files because of how many requests the .exists() and .get_metadata() calls were sending to the server. Now, there's a way to list files with a common prefix in a bucket in batches, so, less requests, more files. Sending these requests also returns the metadata, so essentially I skipped .add_external_files entirely and created LinkEntry s "manually" and just added them to ._dataset_link_entries , then I had to also call that method that updates modified and added count and then had to call ._serialize (as that is what .add_external_files was calling at the end pretty much).

So, by accident, I added LinkEntry s whose link was an S3Path , so then, it failed to serialize that with json (probably because the (default) json encoder didn't know how to handle an S3Path ) and fell back to using pickle and that succeeded and it uploaded the pickled state successfully and everything went well, until...

I call clearml.Dataset.get(...) and it fails (with UnicodeDecodeError ) while loading the state because it can't decode some byte in some position as one would expect since the state was uploaded in a binary (pickled) format and json shouldn't be able to decode it. However, it failed and it didn't attempt to use pickle as an alternative deserializer. It's just a bit of an inconsistency, though personally I feel as though it shouldn't have fallen back to pickle during the upload in the first place.

(Though now thinking about it, I suppose it may have happened because I used ._serialize , in which case, well, that's on me for using internal API, but frankly there was no other way around it because it literally didn't work any other way, despite retrying .exists infinitely and considering that I already could get the metadata and now the uploads take like 2 to 3 hours instead of 40 or more)

TL;DR
It pickled the state during the upload (likely due to my use of clearml.Dataset._serialize and (by accident) having LinkEntry s with S3Path objects as their link s) and then failed to load it with json while retrieving the state via clearml.Dataset.get without attempting to deserialize the state with pickle before failing completely.

  
  
Posted 4 months ago

Hi @<1724235687256920064:profile|LonelyFly9> ! I assume in this case we fail to retrieve the dataset? Can you provide an example when this happens?

  
  
Posted 4 months ago
267 Views
2 Answers
4 months ago
4 months ago
Tags