Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! I'M Using A

Hello!
I'm using a https://github.com/allegroai/clearml/blob/25df5efe74972624671df2ae97a3c629eb0c5322/clearml/backend_interface/task/task.py#L1360 https://github.com/allegroai/clearml/blob/25df5efe74972624671df2ae97a3c629eb0c5322/clearml/backend_interface/task/task.py#L1360 https://github.com/allegroai/clearml/blob/25df5efe74972624671df2ae97a3c629eb0c5322/clearml/backend_interface/task/task.py#L1360 function to implement a "keep N best checkpoints" logic in my training loop. I need my process to keep on running even if it fails to connect to the ClearML server at some point. When I test this case, however (I'm testing it using a self-hosted CLearML), I get my code stuck in an infinite loop of retries within the function mentioned.
The function https://github.com/allegroai/clearml/blob/master/clearml/backend_interface/task/task.py#L1377 a delete request with a raise_on_errors=False flag. Unfortunately, I can't see the reason for it to be False as it prevents the program from breaking out of the retries loop. If I set it to True in the debugger everything works OK: I can catch the exception and postpone the deletion of my artifact.
Could you kindly explain to me if there's some particular reason for this behaviour?

  
  
Posted one year ago
Votes Newest

Answers 5


The function

a delete request with a

raise_on_errors=False

flag.

Are you saying we should expose raise_on_errors it to _delete_artifacts() function itself?
If so, sure seems logic to me, any chance you want to PR it? (please just make sure the default value is still False so we keep backwards compatibility)
wdyt?

  
  
Posted one year ago

Hi SillySealion58

"keep N best checkpoints" logic in my training loop.

If this is the usecase, may I suggest overwriting them locally? (the same will happen on the remote storage) This is exactly how the lightning / ignite feature is implemented

  
  
Posted one year ago

suggest overwriting them locally?

Yeah, that might be an option but it doesn't have enough flexibility for all my scenarios. E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.

  
  
Posted one year ago

E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.

Hmm yes, that makes sense

That'd be a great solution, thanks! I'll create a PR shortly

Thank you! 🙏 🤩

  
  
Posted one year ago

Are you saying we should expose

raise_on_errors

it to _delete_artifacts() function itself?

That'd be a great solution, thanks! I'll create a PR shortly

  
  
Posted one year ago
589 Views
5 Answers
one year ago
one year ago
Tags
Similar posts