Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Are Clearml Datasets Intended To Be Static, Or Can They Be Dynamic?

Are ClearML datasets intended to be static, or can they be dynamic? We intend to have an ETL pipeline running in Databricks environment digesting complex production data into tabular data ready for ML applications. Output of the ETL pipelines will be Parquet files in a cloud (Azure) storage. The intention is then to register this resource as a dataset in ClearML.

Question: If we create a Dataset in ClearML with .add_external_files pointing at our cloud storage Parquet files, will they be copied "physically" onto data server upon creation, or are they kept simply as links? If the latter, will stuff break if the external files change?

Posted 2 months ago
Votes Newest


Hi @<1523701279472226304:profile|SoreHorse95> ! add_external_files will only stores the links. If the file changes and you don't have a dataset with updated links, I would expect that some caching mechanisms will break, resulting in some files to not be cached/not be downloaded again in the cache after getting the dataset.

Posted 2 months ago
1 Answer
2 months ago
2 months ago