Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Everyone, As A Pro-Tier Saas User, I'M Experiencing A Very High Latency When Finalizing A Dataset, It Is Attached In A Big Dataset Version Hierarchy And Since Recently The

Hey everyone,

As a Pro-tier SaaS user, I'm experiencing a very high latency when finalizing a dataset, it is attached in a big dataset version hierarchy and since recently the finalize() execution is taking ~10mins to complete, might there be some big recursive diff operation taking all that time ?

Here's a quick overview of the code:

last_dataset = clearml.Dataset.get(
    dataset_project='MyProject',
    dataset_name='DatasetPreTraining',
    auto_create=True
)

if not last_dataset.is_final():
    dataset = last_dataset
else:
    dataset = clearml.Dataset.create(
        dataset_project='MyProject',
        dataset_name='DatasetPreTraining',
        parent_datasets=[last_dataset.id],
    )

dataset.add_files(constants.TRANSFORMED_DATA_FILE)

dataset.upload()
dataset.finalize()
  
  
Posted 11 months ago
Votes Newest

Answers 7


pruning old ancestors sounds like the right move for now.

  
  
Posted 11 months ago

Hey @<1523701087100473344:profile|SuccessfulKoala55> this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs

  
  
Posted 11 months ago

Hi @<1523702000586330112:profile|FierceHamster54> ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements

  
  
Posted 11 months ago

In the meantime is there some way to set a retention policy for the dataset versions ?

  
  
Posted 11 months ago

Or do I have to add pipeline step to prune ancestors that are too old ?

  
  
Posted 11 months ago

Hi @<1523702000586330112:profile|FierceHamster54> , how big is the version hierarchy? Can you provide some details on the structure? Also, how many files are in the dataset and what are their sizes?

  
  
Posted 11 months ago

Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤

  
  
Posted 11 months ago
709 Views
7 Answers
11 months ago
11 months ago
Tags