Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, I Have A Question Regarding Pipelines. Let'S Say I Have 2 Scripts: Train.Py And Evaluate.Py. Each Of Them Creates A Task Using Task.Init And Logs Some Information. These Scripts Are Run Independently (In My Case They Are Run By Dvc). I Would Like Bot

Hey, I have a question regarding pipelines. Let's say I have 2 scripts: train.py and evaluate.py. Each of them creates a task using Task.init and logs some information. These scripts are run independently (in my case they are run by DVC). I would like both of them to be logged under a single pipeline. Is there a way to do it without having to define PipelineController, adding tasks to it and executing it through it? Something like "continue with this already created pipeline and add the currently run task to it".

I'm trying to connect my existing DVC pipeline to ClearML (to use its tracking functionalities), if someone has some experience with it, let me know 🙂 .

  
  
Posted 2 years ago
Votes Newest

Answers 8


We just do task.close() and then start a new task.Init() manually, so our "pipelines" are self-controlled

  
  
Posted 2 years ago

ScaryLeopard77 , Hi! Is there a specific reason to the aversion from pipelines? What is the use case?

"continue with this already created pipeline and add the currently run task to it"

I'm not sure I understand, can you please elaborate? (I'm pretty sure it's a pipelines feature)

  
  
Posted 2 years ago

The idea is that I first call script start_new_pipeline.py , which should just create the pipeline and then I call scripts train_pipeline.py and evaluate_pipeline.py which contain the tasks that should belong to the pipeline. However I don't know how start_new_pipeline.py should look like so that the following tasks would belong the created pipeline.

  
  
Posted 2 years ago

Hi ScaryLeopard77
You can probably do:
Task.init(...,continue_last_task='task_id_here')This will continue a previously executed Task and log both steps in the same place.
Does that help?
BTW: you can also of course manually report to any Task as it is still running with:
aux_task = Task.get_task(task_id_here) aux_task.get_logger().report_scalar(...)

  
  
Posted 2 years ago

Hi AgitatedDove14 .
That way I loose some execution information, only the execution information from last Task stays logged. That's why I want to keep it as separate tasks under a single pipeline.

  
  
Posted 2 years ago

That's why I want to keep it as separate tasks under a single pipeline.

Hmm Yes, if this is the case then you definitely have to have two Tasks (with execution info on each one).
So you could just create a "draft" pipeline Task and report everything to it? Does that make sense ?
(By design a pipeline is in charge of spinning the Tasks and pulling the data/metric from them if needed, in your case it sounds like you need the Tasks to push the data/metric onto the pipeline Task, this is actually doable).
So the flow can be:
Create pipeline Task (draft) Pass the pipeline Task ID to the "steps" Have the steps report to the "pipeline" TaskDoes that make sense ?

  
  
Posted 2 years ago

I kind of understand the first step -> create the pipeline task, keep it in draft state and save its ID. How do you though pass the ID to the following tasks and have them report to the pipeline (parent) task?

  
  
Posted 2 years ago

Pseudo-ish code:
create pipelinepipeline = Task.create(..., task_type="controller") pipeline.mark_started() print(pipeline.id)2. launch step A (pass arguments via command line argument / os environment)
` task = Task.init(...)
pipeline_id = os.environ['MY_MAIN_PIPELINE']
pipeline_task = Task.get_task(task_id=pipeline_id)

send some metrics / reports etc.

pipeline_task.get_logger().report_scalar(...)
pipeline_task.get_logger().report_text(...) `wdyt? (obvioudly you need to somehow pass the pipeline task id to the steps, I'm not sure I understand how you actually launch these steps, but I'm assuming this is doable)
BTW: why not just use clearml-agent for launching the steps ?

  
  
Posted 2 years ago
988 Views
8 Answers
2 years ago
one year ago
Tags