ClearML FAQ | Hey, I'M Running A Pipeline, And 1 Stage Passed - But The Next One Failed. I Fixed The Bug For The Second One - Is There Any Way To Retry The Pipeline From The Failure?

Answered

Hey, I'M Running A Pipeline, And 1 Stage Passed - But The Next One Failed. I Fixed The Bug For The Second One - Is There Any Way To Retry The Pipeline From The Failure?

Hey, I'm running a pipeline, and 1 stage passed - but the next one failed. I fixed the bug for the second one - is there any way to retry the pipeline from the failure?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

Votes Newest

Answers 18

Hi CleanPigeon16
Yes there is, when you are cloning the pipeline in the UI, go to the Configuration/Pipeline/continue_pipeline and change it to True

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Also, tried the continue_pipeline option, didn't work as it couldn't parse the previous step that run...
ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

CleanPigeon16 Can you send also the "Configuration Object" "Pipeline" section ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is there an option to do this from a pipeline, from within the add_step method? Can you link a reference to cloning and editing a task programmatically? nope, it works well for the pipeline when not I don't choose to continue_pipeline

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

Is there an option to do this from a pipeline, from within the

add_step

method? Can you link a reference to cloning and editing a task programmatically?

Hmm, I think there is an open GitHub issue requesting a similar ability , let me check on the progress ...

nope, it works well for the pipeline when not I don't choose to continue_pipeline

Could you send the full log please?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Sure, redacted most of the params as they are sensitive:
run_experiment { base_task_id = "478cfdae5ed249c18818f1c50864b83c" queue = null parents = [] timeout = null parameters { # Redacted the parameters } executed = "d1d361d1059c4f0981200f59d7683773" } segment_slides { base_task_id = "ae13cc979855482683474e9d435895bb" queue = null parents = ["run_experiment"] timeout = null parameters { Args/param = """ [ # Redacted params from here as well ['checkpoint_filename', '${run_experiment.models.output.-1.url}'], ] """ } executed = false } optimize_point_detection { base_task_id = "f91f8e36b5774cefba6aba87d85959e7" queue = null parents = ["segment_slides"] timeout = null parameters { # And here } executed = null }

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

Thanks CleanPigeon16
Could you verify Task "d1d361d1059c4f0981200f59d7683773" exists (and not archived)?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Great, good to know!

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

AgitatedDove14 is there any update on the open issue you talked about before? I think it's this one: https://github.com/allegroai/clearml/issues/214

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

makes sense! thanks

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

Thanks! A followup question - can I make the steps in the pipeline use the latest commit in the branch?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

Hi CleanPigeon16

can I make the steps in the pipeline use the latest commit in the branch?

Yes:
manually clone the stesp's Task (in the UI), and in the UI edit the Execution section and change to "last sommit on branch" and specify the branch name programmatically (as the above, clone+edit)

ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found

Seems like the "run_experiment" step is not defined. Could that be the case?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yup, it's there in draft mode so I can get the latest git commit when it's used as a base task

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

yup, it's there in draft mode so I can get the latest git commit when it's used as a base task

Yes that seems to be the problem, if it is in draft mode, you have no outputs...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

And for some reason this clone is marked as completed. Not sure why, as it failed

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

CleanPigeon16 Coming very soon, we adding a few features for the pipeline, this one will also be included :)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The pipeline stores the state of it's previous run, specifically the executed steps.
In our case the executed step was reset (I assume) so it cannot find the output model you are referring to, hence crashing
CleanPigeon16 make sense ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Exception in thread Thread-5: Traceback (most recent call last): File "/opt/pyenv/versions/3.6.8/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/opt/pyenv/versions/3.6.8/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/automation/controller.py", line 615, in _daemon if self._launch_node(self._nodes[name]): File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/automation/controller.py", line 436, in _launch_node updated_hyper_parameters[k] = self._parse_step_ref(v) File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/automation/controller.py", line 787, in _parse_step_ref new_val = self.__parse_step_reference(g) File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/automation/controller.py", line 724, in __parse_step_reference step_ref_string, prev_step)) ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CleanPigeon16
				
					0
					 × 1

Write your answer

2K Views

18 Answers

4 years ago

2 years ago