Reputation
Badges 1
27 × Eureka!Step 1 was aborted, but the second still was scheduled
Hi, any chance you got some time to look if you could replicate on your side ?
- Itโs a pipeline from Tasks.
- clearml==1.13.2
- For instance, in this pipeline, if the first task failed - then the remaining task are not schedule for execution which is what I expect. I am just surprised that if the first task is aborted instead by the user, the following task is still schedule for execution (and will fail cause itโs dependant on the first one to complete).
Ok - good to know this is odd ๐
Itโs created like this (I remove some bits for readability)
def _run(pipeline_id, step):
from pipeline_broker import pipeline
pipeline.run_step(pipeline_id=pipeline_id, step=step)
def launch(
cfg,
queue: str = "default",
abort_on_failure: bool = False,
project: str = "TrainingPipeline",
start_locally: bool = False,
task_regex: str = ".*",
):
...
pipe = PipelineController(
project=project,
name...
I am running clearml-agent 1.6.1
Neat - looks like exactly what I looking for thxx
Thx working now on 1.14.2 ๐
No just just the clearml-agent
I also created an issue in the repo directly. Thx for your help.
Yes, I agree, it should be considered as failed and the PipelineController should not trigger the following task which depends on the first one. My problem is that itโs not the behavior I observe, the second task still get scheduled for execution. Is there a way to specify that to the PipelineController logic ?
And If I create myself a Pro account - can I somehow piggyback on the existing UIs to display the state of the Autoscaler Task?
Hey, finally got to try it, sorry about the delay.
However, I tried on 1.14.1 but i still get the same behavior
root@clement-controller-1:~# head clearml.conf
agent {
default_docker {
arguments: ["-v","/var/run/docker.sock:/var/run/docker.sock"]
}}
Neat - it works ! Thanks for the quick response ๐
python3 -m clearml_agent --config-file clearml.conf daemon --foreground --queue services --service --docker --cpu-only
So I can confirm I have the same behavior with this minomal example
#!/usr/bin/env python3
import fire
from typing import Optional
import time
from clearml import PipelineController
def step_one(a=1):
print("Step 1")
time.sleep(120)
return True
def step_two(a=1):
print("Step 2")
time.sleep(120)
return True
def launch():
pipe = PipelineController(
project="TEST",
name="Pipeline demo",
version="1.1",
add_pipeline_tags=False,
...
Ok - I customized it a bit to our workflow โ so I wanted to keep our โforkโ of the autoscaler but I guess this is not supported.
And same behavior if I make the dependance explicty via the retunr of the first one
#!/usr/bin/env python3
import fire
from typing import Optional
import time
from clearml import PipelineController
def step_one(a=1):
import time
print("Step 1")
time.sleep(120)
return True
def step_two(a=1):
import time
print("Step 2")
time.sleep(120)
return True
def launch(
tenant: str = "demo",
loc_id: str = "common",
tag: str = "test",
pipeline_id: Optio...
Yep - sounds perfect ๐