Reputation
Badges 1
5 × Eureka!You can get a chunk number that contains your file and download that chunk
I also though clearML writes this mapping ( state.json
) into one of its databases: Mongo, Redis, Elasticsearch.
Hi @<1523701070390366208:profile|CostlyOstrich36> . Thank you for your advise, it definitely makes sense. Regarding to the first point, each dataset has a file state.json
. In this file there os a key artifact_name
e.g., data
, data_001
, etc, and relative path of a file. I thought I can map this key with the chunk number. So, if I pull this file from s3 bucket, I can conclude which chunk I should download to get a specific file. Am I wrong?
what do you mean by draft? do you want to actually run your pipeline even for testing purposes?
Thank you @<1523701070390366208:profile|CostlyOstrich36> 🤓
I can add a little piece of context.
- I want to give my users a way to pic a specific batch to get a file they need. Right now there is no way to download just one specific file from an entire dataset.
- I need a way to check whether a file has already been uploaded to some other dataset or not.
Also, for some reason I don’t have the ability to copy pipelines. Tell me, is this normal?
Every time you click run
a pipeline ran is cloned, so you can pass new parameters right from the UI
It is deterministic. When you do Dataset.get(), clearML downloads file state.json, where you can see all relative file paths and chunks number