To be clear this mostly occurred because of probably slightly unintended use of clearml, but if you remember, previously I had trouble adding external files because of how many requests the .exists()
and .get_metadata()
calls were sending to the server. Now, there's a way to list files with a common prefix in a bucket in batches, so, less requests, more files. Sending these requests also returns the metadata, so essentially I skipped .add_external_files
entirely and created LinkEntry
s "manually" and just added them to ._dataset_link_entries
, then I had to also call that method that updates modified and added count and then had to call ._serialize
(as that is what .add_external_files
was calling at the end pretty much).
So, by accident, I added LinkEntry
s whose link
was an S3Path
, so then, it failed to serialize that with json
(probably because the (default) json
encoder didn't know how to handle an S3Path
) and fell back to using pickle
and that succeeded and it uploaded the pickled state successfully and everything went well, until...
I call clearml.Dataset.get(...)
and it fails (with UnicodeDecodeError
) while loading the state because it can't decode some byte in some position as one would expect since the state was uploaded in a binary (pickled) format and json
shouldn't be able to decode it. However, it failed and it didn't attempt to use pickle
as an alternative deserializer. It's just a bit of an inconsistency, though personally I feel as though it shouldn't have fallen back to pickle
during the upload in the first place.
(Though now thinking about it, I suppose it may have happened because I used ._serialize
, in which case, well, that's on me for using internal API, but frankly there was no other way around it because it literally didn't work any other way, despite retrying .exists
infinitely and considering that I already could get the metadata and now the uploads take like 2 to 3 hours instead of 40 or more)
TL;DR
It pickled the state during the upload (likely due to my use of clearml.Dataset._serialize
and (by accident) having LinkEntry
s with S3Path
objects as their link
s) and then failed to load it with json
while retrieving the state via clearml.Dataset.get
without attempting to deserialize the state with pickle
before failing completely.
Hi @<1724235687256920064:profile|LonelyFly9> ! I assume in this case we fail to retrieve the dataset? Can you provide an example when this happens?