Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone! I'Ve Noticed That If I Run An Experiment And It Fails, The Clearml Agent Will Delete All Datasets That Have Been Downloaded During The Run. Is It Correct Behavior? How Can I Force The Agent To Preserve Such Datasets?

Hi everyone! I've noticed that if I run an experiment and it fails, the ClearML agent will delete all datasets that have been downloaded during the run. Is it correct behavior? How can I force the agent to preserve such datasets?

  
  
Posted 2 years ago
Votes Newest

Answers 9


So when you say the files are deleted, how can you tell? where did you look for them?

  
  
Posted 2 years ago

ClearML agent will delete all datasets

I'm not sure I understood how you've ran the agent...

  
  
Posted 2 years ago

Hi ExcitedSeaurchin87 , I think the files are being downloaded to the cache, and the cache simply overwrites older files. How are you running the agent exactly?

  
  
Posted 2 years ago

Sorry. I probably misunderstood you. I just downloaded the clearml-agent package to my machine and ran the agent with the following command: python -m clearml_agent daemon --queue default dinara --docker --detached

  
  
Posted 2 years ago

OK

  
  
Posted 2 years ago

From an efficiency perspective, we should be pulling data as we feed into training. That said, always a good idea to uncompress large zip files and store them as smaller ones that allow you to batch pull for training.

  
  
Posted 2 years ago

Yes, that's correct. I don't want to re-download datasets because of their large size.

  
  
Posted 2 years ago

ExcitedSeaurchin87 , Hi 🙂

I think it's correct behavior - You wouldn't want leftover files flooding your computer.

Regarding preserving the datasets - I'm guessing that you're doing the pre-processing & training in the same task so if the training fails you don't want to re-download the data?

  
  
Posted 2 years ago

SuccessfulKoala55
I initialized the task with Python:

task = Task.init(project_name=args.project_name, task_name=args.task_name)

and downloaded set of datasets later in the code:

for dataset_name in datasets_list:
clearml_dataset = clearml.Dataset.get(dataset_project=dataset_project, dataset_name=dataset_name)
clearml_dataset_path = clearml_dataset.get_local_copy()

Then I go through the resulting directories in search of the files I need, and send their paths to Pytorch dataset object. If the run fails somewhere later I want to preserve these datasets downloaded.

  
  
Posted 2 years ago
1K Views
9 Answers
2 years ago
one year ago
Tags