I'm still a bit confused around the fact that since my function runs once per hour, why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.
Can you share the code and the way you're running it?
VexedCat68 I think this is the issue described here:
https://github.com/allegroai/clearml/issues/491
Can you test with the latest RC:pip install clearml==1.1.5rc1
why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.
The anonymous Tasks are The Dataset you are creating (a Dataset version is also a Task of a certain type with artifacts, the idea is usually Datasets are created from code, hence the need to combine the two).
Make sense ?
Can you spot something here? Because to me it still looks like it should only create a new Dataset object if batch size requirement is fulfilled, after which it creates and publishes the dataset and empties the directory.
Once the data is published, a dataset trigger is activated in the checkbox_.... file. which creates a clearml-task for training the model.
AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong
Hi VexedCat68
The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.
Based on the screenshots these are the Datasets (which are also a Task with specific type etc).
I would actually name the Datasets you are creating You need to specify the parent version (i.e. how would it know it is a child dataset changeset) I'm assuming they are all uploading everything, hence still running?BTW: you can use the argument single_instance=True
making sure that no new function callback is created until the previous one completed
Let me share the code with you, and how I think they interact with eachother.
VexedCat68
a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "original" Dataset trigger.
Make sesne ?
because those spawned processes are from a file register_dataset.py , however I'm personally not using any file like that and I think it's a file from the library.
apparently it keeps caliing this register_dataset.py script
Let me tell you what I think is happening and you can correct me where I'm going wrong.
Under certain conditions at certain times, a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
N publishes = N Triggers = N Anonymous Tasks, right?
The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.