because those spawned processes are from a file register_dataset.py , however I'm personally not using any file like that and I think it's a file from the library.
AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong
Let me tell you what I think is happening and you can correct me where I'm going wrong.
Under certain conditions at certain times, a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
N publishes = N Triggers = N Anonymous Tasks, right?
Hi VexedCat68
The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.
Based on the screenshots these are the Datasets (which are also a Task with specific type etc).
I would actually name the Datasets you are creating You need to specify the parent version (i.e. how would it know it is a child dataset changeset) I'm assuming they are all uploading everything, hence still running?BTW: you can use the argument single_instance=True
making sure that no new function callback is created until the previous one completed
VexedCat68
a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "original" Dataset trigger.
Make sesne ?
I'm still a bit confused around the fact that since my function runs once per hour, why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.
VexedCat68 I think this is the issue described here:
https://github.com/allegroai/clearml/issues/491
Can you test with the latest RC:pip install clearml==1.1.5rc1
The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.
Can you share the code and the way you're running it?
apparently it keeps caliing this register_dataset.py script
why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.
The anonymous Tasks are The Dataset you are creating (a Dataset version is also a Task of a certain type with artifacts, the idea is usually Datasets are created from code, hence the need to combine the two).
Make sense ?
Can you spot something here? Because to me it still looks like it should only create a new Dataset object if batch size requirement is fulfilled, after which it creates and publishes the dataset and empties the directory.
Once the data is published, a dataset trigger is activated in the checkbox_.... file. which creates a clearml-task for training the model.
Let me share the code with you, and how I think they interact with eachother.