I think that by default the zipped package files are 0.5GB
(you can control it None look for --chunk-size)
I think the missing part of the api is understanding which chunk your specific file stored in.
You can do something like:
ds = Dataset.get(...)
the_artifact_chunk_I_need = ds.file_entries_dict["myt/file/here"].artifact_name
wdyt?
maybe worth to add an interface ?
I would like to start off by saying that I absolutely love clearml.
@<1547028031053238272:profile|MassiveGoldfish6> thank you for saying that! 😍
Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?
Well by default files are packaged into multiple zip files, you can control the size of the zip file for finer granularity, but at the end when you download, you are downloading the entire packaged file. Now in case this is a version that contains only a few changes (compared to the parent versions) this zip-file might be relatively small.
Does that help? what is the use case? how large is the dataset ?
That is very useful. Thank you.
A use case would be the following. I have a 200GByte dataset and I want to pull 3 files that are 20MB each