Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Everyone, As A Pro-Tier Saas User, I'M Experiencing A Very High Latency When Finalizing A Dataset, It Is Attached In A Big Dataset Version Hierarchy And Since Recently The

Hey everyone,

As a Pro-tier SaaS user, I'm experiencing a very high latency when finalizing a dataset, it is attached in a big dataset version hierarchy and since recently the finalize() execution is taking ~10mins to complete, might there be some big recursive diff operation taking all that time ?

Here's a quick overview of the code:

last_dataset = clearml.Dataset.get(
    dataset_project='MyProject',
    dataset_name='DatasetPreTraining',
    auto_create=True
)

if not last_dataset.is_final():
    dataset = last_dataset
else:
    dataset = clearml.Dataset.create(
        dataset_project='MyProject',
        dataset_name='DatasetPreTraining',
        parent_datasets=[last_dataset.id],
    )

dataset.add_files(constants.TRANSFORMED_DATA_FILE)

dataset.upload()
dataset.finalize()
  
  
Posted 5 months ago
Votes Newest

Answers 7


Hi @<1523702000586330112:profile|FierceHamster54> , how big is the version hierarchy? Can you provide some details on the structure? Also, how many files are in the dataset and what are their sizes?

  
  
Posted 5 months ago

pruning old ancestors sounds like the right move for now.

  
  
Posted 4 months ago

Or do I have to add pipeline step to prune ancestors that are too old ?

  
  
Posted 4 months ago

Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤

  
  
Posted 4 months ago

Hey @<1523701087100473344:profile|SuccessfulKoala55> this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs

  
  
Posted 4 months ago

In the meantime is there some way to set a retention policy for the dataset versions ?

  
  
Posted 4 months ago

Hi @<1523702000586330112:profile|FierceHamster54> ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements

  
  
Posted 4 months ago
299 Views
7 Answers
5 months ago
4 months ago
Tags