Hi @<1523701279472226304:profile|SoreHorse95> ! add_external_files
will only stores the links. If the file changes and you don't have a dataset with updated links, I would expect that some caching mechanisms will break, resulting in some files to not be cached/not be downloaded again in the cache after getting the dataset.
Answered
Are Clearml Datasets Intended To Be Static, Or Can They Be Dynamic?
Are ClearML datasets intended to be static, or can they be dynamic? We intend to have an ETL pipeline running in Databricks environment digesting complex production data into tabular data ready for ML applications. Output of the ETL pipelines will be Parquet files in a cloud (Azure) storage. The intention is then to register this resource as a dataset in ClearML.
Question: If we create a Dataset in ClearML with .add_external_files
pointing at our cloud storage Parquet files, will they be copied "physically" onto data server upon creation, or are they kept simply as links? If the latter, will stuff break if the external files change?
533 Views
1
Answer
9 months ago
9 months ago
Tags