Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Everyone, As A Pro-Tier Saas User, I'M Experiencing A Very High Latency When Finalizing A Dataset, It Is Attached In A Big Dataset Version Hierarchy And Since Recently The

Hey everyone,

As a Pro-tier SaaS user, I'm experiencing a very high latency when finalizing a dataset, it is attached in a big dataset version hierarchy and since recently the finalize() execution is taking ~10mins to complete, might there be some big recursive diff operation taking all that time ?

Here's a quick overview of the code:

last_dataset = clearml.Dataset.get(
    dataset_project='MyProject',
    dataset_name='DatasetPreTraining',
    auto_create=True
)

if not last_dataset.is_final():
    dataset = last_dataset
else:
    dataset = clearml.Dataset.create(
        dataset_project='MyProject',
        dataset_name='DatasetPreTraining',
        parent_datasets=[last_dataset.id],
    )

dataset.add_files(constants.TRANSFORMED_DATA_FILE)

dataset.upload()
dataset.finalize()
  
  
Posted one year ago
Votes Newest

Answers 7


Hi @<1523702000586330112:profile|FierceHamster54> ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements

  
  
Posted one year ago

Hey @<1523701087100473344:profile|SuccessfulKoala55> this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs

  
  
Posted one year ago

In the meantime is there some way to set a retention policy for the dataset versions ?

  
  
Posted one year ago

pruning old ancestors sounds like the right move for now.

  
  
Posted one year ago

Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤

  
  
Posted one year ago

Or do I have to add pipeline step to prune ancestors that are too old ?

  
  
Posted one year ago

Hi @<1523702000586330112:profile|FierceHamster54> , how big is the version hierarchy? Can you provide some details on the structure? Also, how many files are in the dataset and what are their sizes?

  
  
Posted one year ago
767 Views
7 Answers
one year ago
one year ago
Tags