I however have another problem. I have a dataset trigger that has a schedule task.
This problem occurs when I'm scheduling a task. Copies of the task keep being put on the queue even though the trigger only fired once.
But what's happening is, that I only publish a dataset once but every time it polls, it gets triggered and enqueues a task even though the dataset was published only once.
So I took dataset trigger from this and added it to my own test code, which needs to run a task every time this trigger is activated.
So I just published a dataset once but it keeps scheduling task.
This here shows my situation. You can see the code on the left and the tasks called 'Cassava Training' on the right. They keep getting enqueued even though I only sent a trigger once. By that I mean I only published a dataset once.
So in my head, every time i publish a dataset, it should get triggered and run that task.
Okay so they run once i started a clear ml agent listening to that queue.
So it won't work without clearml-agent? Sorry for the barrage of questions. I'm just very confused right now.
Also could you explain the difference between trigger.start() and trigger.start_remotely()
Also could you explain the difference between trigger.start() and trigger.start_remotely()
Start will start the trigger process (the one "watching the changes") locally (this makes sense for debugging etc.)
start_remotely will launch the trigger process on the "services" where it should live forever 🙂
Okay so when I add trigger_on_tags, the repetition issue is resolved.
Nice!
This problem occurs when I'm scheduling a task. Copies of the task keep being put on the queue even though the trigger only fired once.
Hmm I think a bit lost here (and I have a feeling there is some hidden bug somewhere that I'd like us to fix)
How exactly do I make it trigger twice on the same Dataset?
VexedCat68 , what do you mean by trigger? You want some indication that a dataset whats published so you can move to the next step in your pipeline?
Hi VexedCat68
Check this example:
https://github.com/allegroai/clearml/blob/4f9aaa69ed2d5b8ea68ebee5508610d0b1935d5f/examples/scheduler/trigger_example.py#L44
To be more clear. An example use case for me would be, that I'm trying to make a pipeline which every time a new dataset/batch is published using clearml-data,
Get the data Train it Save the model and publish it
I want to start this process with a trigger when a dataset is published to the server. Any example which I can look to for accomplishing something like this?
It works, however it shows the task is enqueued and pending. Note I am using .start() and not .start_remotely() for now
Yes, for an enqueued task to run you require an agent to run against the task 🙂
Okay so when I add trigger_on_tags, the repetition issue is resolved.
Also, the task just prints a small string on the console.
I'd like to add an update to this, when I use schedule function instead of schedule task with the dataset trigger scheduler, it works as intended. It runs the desired function when triggered. Then is asleep again next time since no other trigger was fired.
VexedCat68
But what's happening is, that I only publish a dataset once but every time it polls,
this seems wrong (i.e a bug?!), how do you setup the trigger ? is the Trigger Task constantly running or are you re-launching it?