Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, I Have A New Question Related To Triggerscheduler. I Am Seeing Very Erratic Behavior On Datasets Triggers. I Ahve A Cron Scheduler That Creates A Dataset After A File Gets Dropped On S3 Into A Project And Some Tags, In Particular "Processed=Fals

Hi Guys,

I have a new question related to TriggerScheduler. I am seeing very erratic behavior on datasets triggers. I ahve a Cron scheduler that creates a dataset after a file gets dropped on s3 into a project and some tags, in particular "processed=false".
I have a TriggerScheduler that has a add_dataset_trigger that triggers a task id. The Cron scheduler works great, but the TriggerScheduler, is 10% of the time maybe.
Here is my config:

trigger = TriggerScheduler(
        pooling_frequency_minutes=0.1, sync_frequency_minutes=0.1
    )
for client in batch_transfer_params:
  trigger.add_dataset_trigger(
            name=f"batch processing - {client['client_name']}",
            # schedule_function=trigger_dataset_func,
            schedule_task_id=schedule_task_id,
            schedule_queue="high-mem",
            trigger_project=client["client_name"],
            trigger_name="batch_processing - incoming",
            target_project=client["client_name"],
            task_parameters={
                "client_name": client["client_name"],
                "dataset_id": "${dataset.id}",
            },
            trigger_required_tags=["dm=true", "in=true", "processed=false"],
            single_instance=True,
        )

I have tried so many scenarios right now I don't know what to think anymore, I cannot get it to work reliably.

  • sometimes removing the tag processed=false and putting back on will trigger but sometimes it won't
  • I checked the triggers after adding them via trigger.get_triggers() and it looks fine it creates one trigger per subfolder.
    Any pointers is appreciated.

thanks a lot for all the work.

  
  
Posted one day ago
Votes Newest

Answers


I will actually write here what I found. trigger_on_tags and trigger_required are actually the same and concatenated with OR. You need to make sure you are using the "__$all" before if that's the behavior you want.
there is a bug in my opinion on the deserialization process because the triggers get de-dupped by trigger name or when using trigger_project there are dozens of triggers being created with the same name (one per dataset in the project). This leads to random behavior depending on which project id survived the deserialization process.
The way I solved it is by subclassing trigger_scheduler and overwriting add_dataset_trigger to create a unique name for the created triggers when using trigger_project (I use a combination of name and project_id.

  
  
Posted 12 hours ago
5 Views
1 Answer
one day ago
7 hours ago
Tags