Another Question, I Have Written A Code That Includes A Task Scheduler That Calls A Function. That Function Watches A Folder And If There Are Sufficient Images, It Creates And Publishes The Dataset, After Which It Clears The Folder. Problem, For Some Rea

Answered

Another question, I have written a code that includes a task scheduler that calls a function. That function watches a folder and if there are sufficient images, it creates and publishes the dataset, after which it clears the folder.

Problem, for some reason, it's creating too many anonymous tasks which just stay running for some reason. Any help would be appreciated.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Votes Newest

Answers 16

Let me tell you what I think is happening and you can correct me where I'm going wrong.

Under certain conditions at certain times, a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.

N publishes = N Triggers = N Anonymous Tasks, right?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

apparently it keeps caliing this register_dataset.py script

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

🤞

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.

The anonymous Tasks are The Dataset you are creating (a Dataset version is also a Task of a certain type with artifacts, the idea is usually Datasets are created from code, hence the need to combine the two).
Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Can you spot something here? Because to me it still looks like it should only create a new Dataset object if batch size requirement is fulfilled, after which it creates and publishes the dataset and empties the directory.

Once the data is published, a dataset trigger is activated in the checkbox_.... file. which creates a clearml-task for training the model.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Hi VexedCat68

The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.

Based on the screenshots these are the Datasets (which are also a Task with specific type etc).
I would actually name the Datasets you are creating You need to specify the parent version (i.e. how would it know it is a child dataset changeset) I'm assuming they are all uploading everything, hence still running?BTW: you can use the argument single_instance=True making sure that no new function callback is created until the previous one completed

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Can you share the code and the way you're running it?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

VexedCat68 I think this is the issue described here:
https://github.com/allegroai/clearml/issues/491
Can you test with the latest RC:
pip install clearml==1.1.5rc1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'll test it with the updated one.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

I'm still a bit confused around the fact that since my function runs once per hour, why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

VexedCat68

a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.

From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "original" Dataset trigger.
Make sesne ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Let me share the code with you, and how I think they interact with eachother.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

because those spawned processes are from a file register_dataset.py , however I'm personally not using any file like that and I think it's a file from the library.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Write your answer

2K Views

16 Answers

3 years ago

one year ago