
Reputation
Badges 1
5 × Eureka!Also, for some reason I don’t have the ability to copy pipelines. Tell me, is this normal?
Every time you click run
a pipeline ran is cloned, so you can pass new parameters right from the UI
I can add a little piece of context.
- I want to give my users a way to pic a specific batch to get a file they need. Right now there is no way to download just one specific file from an entire dataset.
- I need a way to check whether a file has already been uploaded to some other dataset or not.
I also though clearML writes this mapping ( state.json
) into one of its databases: Mongo, Redis, Elasticsearch.
It is deterministic. When you do Dataset.get(), clearML downloads file state.json, where you can see all relative file paths and chunks number
Hi @<1523701070390366208:profile|CostlyOstrich36> . Thank you for your advise, it definitely makes sense. Regarding to the first point, each dataset has a file state.json
. In this file there os a key artifact_name
e.g., data
, data_001
, etc, and relative path of a file. I thought I can map this key with the chunk number. So, if I pull this file from s3 bucket, I can conclude which chunk I should download to get a specific file. Am I wrong?
Thank you @<1523701070390366208:profile|CostlyOstrich36> 🤓
what do you mean by draft? do you want to actually run your pipeline even for testing purposes?
You can get a chunk number that contains your file and download that chunk