Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Another Question, I Have Written A Code That Includes A Task Scheduler That Calls A Function. That Function Watches A Folder And If There Are Sufficient Images, It Creates And Publishes The Dataset, After Which It Clears The Folder. Problem, For Some Rea

Another question, I have written a code that includes a task scheduler that calls a function. That function watches a folder and if there are sufficient images, it creates and publishes the dataset, after which it clears the folder.

Problem, for some reason, it's creating too many anonymous tasks which just stay running for some reason. Any help would be appreciated.

  
  
Posted 2 years ago
Votes Newest

Answers 16


🤞

  
  
Posted 2 years ago

I'll test it with the updated one.

  
  
Posted 2 years ago

VexedCat68 I think this is the issue described here:
https://github.com/allegroai/clearml/issues/491
Can you test with the latest RC:
pip install clearml==1.1.5rc1

  
  
Posted 2 years ago

Can you spot something here? Because to me it still looks like it should only create a new Dataset object if batch size requirement is fulfilled, after which it creates and publishes the dataset and empties the directory.

Once the data is published, a dataset trigger is activated in the checkbox_.... file. which creates a clearml-task for training the model.

  
  
Posted 2 years ago

Let me share the code with you, and how I think they interact with eachother.

  
  
Posted 2 years ago

VexedCat68

a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.

From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "original" Dataset trigger.
Make sesne ?

  
  
Posted 2 years ago

because those spawned processes are from a file register_dataset.py , however I'm personally not using any file like that and I think it's a file from the library.

  
  
Posted 2 years ago

AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong

  
  
Posted 2 years ago

Let me tell you what I think is happening and you can correct me where I'm going wrong.

Under certain conditions at certain times, a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.

N publishes = N Triggers = N Anonymous Tasks, right?

  
  
Posted 2 years ago

why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.

The anonymous Tasks are The Dataset you are creating (a Dataset version is also a Task of a certain type with artifacts, the idea is usually Datasets are created from code, hence the need to combine the two).
Make sense ?

  
  
Posted 2 years ago

I'm still a bit confused around the fact that since my function runs once per hour, why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.

  
  
Posted 2 years ago

Hi VexedCat68

The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.

Based on the screenshots these are the Datasets (which are also a Task with specific type etc).
I would actually name the Datasets you are creating You need to specify the parent version (i.e. how would it know it is a child dataset changeset) I'm assuming they are all uploading everything, hence still running?BTW: you can use the argument single_instance=True making sure that no new function callback is created until the previous one completed

  
  
Posted 2 years ago

apparently it keeps caliing this register_dataset.py script

  
  
Posted 2 years ago

The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.

  
  
Posted 2 years ago

image

  
  
Posted 2 years ago

Can you share the code and the way you're running it?

  
  
Posted 2 years ago
1K Views
16 Answers
2 years ago
7 months ago
Tags