Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, We'Ve Experienced Some Issues With Clearml Trigger Schedulers We Were Playing With In The Last Few Days. This Is What Happened:

Hey,

we've experienced some issues with Clearml Trigger Schedulers we were playing with in the last few days. This is what happened:
We have a trigger listening to the queued events on a particular queue and with a particular tag. An actual excerpt from our codebase:# this should call start_trigger when a task with 'particular-tag' tag is enqueued trigger.add_task_trigger( trigger_required_tags=['particular-tag'], schedule_function=start_trigger, trigger_on_status=['queued'], name="job_start", )We have started an experiment as usual (not in the particular queue nor in with the particular tag). It worked We aborted it. It stopped We tried to enqueue the stopped task at the particular queue and we added the particular tag The trigger did not trigger. The experiment was forever pending.
We don't know for sure why this happens but it seems that status_changed is not updated accordingly when the status changes from draft to pending. Any ideas?

  
  
Posted one year ago
Votes Newest

Answers 7


Hi RotundHedgehog76
Notice that the "queued" is on the state of the Task, as well as the the tag
We tried to enqueue the stopped task at the particular queue and we added the particular tagWhat do you mean by specific queue ? this will trigger on any Queued Task with the 'particular-tag' ?

  
  
Posted one year ago

We use an empty queue to enqueue our tasks in, just to trigger the scheduler

it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it

👍

It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)

Is the trigger controller running on the services queue ?

  
  
Posted one year ago

Yeah, you are right.

We use an empty queue to enqueue our tasks in, just to trigger the scheduler 😅 it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it

It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)

  
  
Posted one year ago

however, I don't think it's our code, since the trigger is not triggered at all, unless a new task is created :((

Yeah I think you are correct, I'm more interested in understanding the how you use it ...
BTW can you test with the latest clearml python version (the trigger code is the important part)?

  
  
Posted one year ago

This is odd... can you post the entire trigger code ?
also what's the clearml version?

  
  
Posted one year ago

Is the trigger controller running on the services queue ?

Yes, yes it is

  
  
Posted one year ago

Unfortunately, no, I can't paste the whole code. In a nutshell, the trigger spawns a new GCE instance with a clearml-agent running to schedule the experiments in Cloud.
This is an excerpt:

def gcp_start_trigger(task_id: str):
    curr_task = Task.get_task(task_id)
    #curr_task.reset(force=True)
    config = extract_config(curr_task)
    machine_type = config.get('machine-type')
    queue_name = f"gcp/{machine_type}"
    ensure_queue(queue_name)  # creates a new queue if it doesn't exist
    instance_name = name_generator(task_id)
    print(config)  # debug print
    gpus = create_gpus(config)  # define gpus
    create_from_machine_type(
        project_id=GOOGLE_PROJECT,
        zone=f"{GOOGLE_ZONE}",
        instance_name=instance_name,
        machine_type=machine_type,
        accelerators=gpus,
        queue_name=queue_name
    )
    Task.dequeue(curr_task)  # remove from an empty queue
    Task.enqueue(curr_task, queue_name=queue_name)  # put the task in a particular queue
    return

def gcp_stop_trigger(task_id):
    instance_name = name_generator(task_id)
    delete_instance(
        project_id=GOOGLE_PROJECT,
        zone=f"{GOOGLE_ZONE}",
        machine_name=instance_name
    )
    delete_disk(
        project_id=GOOGLE_PROJECT,
        zone=f"{GOOGLE_ZONE}",
        machine_name=f"{instance_name}",
    )
    return

trigger = TriggerScheduler(pooling_frequency_minutes=10/60)
trigger.add_task_trigger(
    trigger_required_tags=['google'],
    schedule_function=gcp_start_trigger,
    trigger_on_status=['queued'],
    name="job_start",
)
trigger.add_task_trigger(
    trigger_required_tags=['google'],
    schedule_function=gcp_stop_trigger,
    trigger_on_status=['failed', 'completed', 'stopped', 'closed'],
    name="job_end",
)
trigger.start_remotely()

however, I don't think it's our code, since the trigger is not triggered at all, unless a new task is created :((

as for the clearml version, they differ:

  • the clearml server we self-host shows this: WebApp: 1.7.0-232 • Server: 1.7.0-232 • API: 2.21
  • the installed clearml in a trigger task shows clearml==1.8.2
  • the installed clearml in the experiment task that attempts to trigger is 1.9.0
  
  
Posted one year ago
589 Views
7 Answers
one year ago
one year ago
Tags