Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Currently Trying To Run A Pipeline Locally To Test A Pipeline Component With

Hey currently trying to run a pipeline locally to test a pipeline component with PipelineDecorator.run_locally() , first try returned a random pandas error, fixed it but the component execution still returns the same backtrace as if the code fix was not applied but when using PipelineDecorator.debug_pipeline() the fix is applied and the component runs properly, tried:
rebooting setting the cache=False , in my component's decorator updating the pipeline version in my@pipelineDecorator.pipeline() decorator deleting .clearml/cache/ ` deleting the pipeline from the GUIBut no effect.

I understand the main difference is that debug_pipelines() runs component as functions instead of sub-process ClearML's Tasks, is this possible the component was cached somewhere by my local clearml-agent and that I missed it ?

  
  
Posted 2 years ago
Votes Newest

Answers 20


Would gladly try to run it on a remote instance to verify the thesis on some local cache acting up but unfortunately also ran into an issue with the GCP autoscaler https://clearml.slack.com/archives/CTK20V944/p1665664690293529

  
  
Posted 2 years ago

I have a pipeline with a single component:
` @PipelineDecorator.component(
return_values=['dataset_id'],
cache=True,
task_type=TaskTypes.data_processing,
execution_queue='Quad_VCPU_16GB'
)
def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
"""
Convert autocut logs from a specified time window into usable dataset in generic format.
"""
print('[STEP 1/4] Generating dataset from autocut logs...')
import os
import cv2
import sys
import srsly
import boto3
import shutil
import numpy as np
import pandas as pd
from clearml import Dataset
from zipfile import ZipFile

time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist()

... That I execute from there: @PipelineDecorator.pipeline(
name="VINZ Auto-Retrain",
project="VINZ",
version="0.0.1"
)
def executing_pipeline(start_date, end_date):
print("Starting VINZ Auto-Retrain pipeline...")
print(f"Start date: {start_date}")
print(f"End date: {end_date}")

window_dataset_id = generate_dataset(start_date, end_date)

if name == 'main':
PipelineDecorator.run_locally()

executing_pipeline(
    start_date="2022-01-01",
    end_date="2022-03-02"
) `

During my first try I got a legitimate error since the parameter freq from pd.date_range() was missing so I fixed it, but on further re-execution the pipeline the backtrace is still returned as if the code was not changed.

But when replaciing the line PipelineDecorator.run_locally() by PipelineDecorator.debug_pipeline() the component code works properly.

  
  
Posted 2 years ago

Thus the main difference of behavior must be coming from the _debug_execute_step_function property in the Controller class, currently skimming through it to try to identify a cause, did I provide you enough info btw CostlyOstrich36 ?

  
  
Posted 2 years ago

CostlyOstrich36 Having the same issue running on a remote worker, even tho the line works correctly on python interpreter and the component run correctly in local debug mode (but not standard local mode):
File "/root/.clearml/venvs-builds/3.10/code/generate_dataset.py", line 18, in generate_dataset time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist() File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py", line 1128, in date_range dtarr = DatetimeArray._generate_range( File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 355, in _generate_range raise ValueError( ValueError: Of the four parameters: start, end, periods, and freq, exactly three must be specified

  
  
Posted 2 years ago

Can you please elaborate on what you're trying to do and what is failing?

  
  
Posted 2 years ago

The pipeline log indicate the same version of Pandas ( 1.5.0 ) is installed, I really don't know what is happening

  
  
Posted 2 years ago

It's funny cause the line in the backtrace is the correct one so I don't think it has something to do with strange cachine behavior

  
  
Posted 2 years ago

Didn't have a chance to try and reproduce it, will try soon 🙂

  
  
Posted 2 years ago

So basically CostlyOstrich36 I feel like debug_pipeline() use the latest version of my code as it is defined on my filesystem but the run_locally() used a previous version it cached somehow

  
  
Posted 2 years ago

CostlyOstrich36 Should I start a new issue since I pinpointed the exact problem given than the beginning of this one was clearly confusing for both of us ?

  
  
Posted 2 years ago

What happens if you delete ~/.clearml It's clearml's cache folder

  
  
Posted 2 years ago

I suppose you cannot reproduct the issue from your side ?
Maybe it has to do that the faulty code was initially defined as a cached component

  
  
Posted 2 years ago

I'm just not sure what error you're getting

  
  
Posted 2 years ago

Ia lready deleted ~/.clearml/cache but I'll try deleting the entire folder

  
  
Posted 2 years ago

When running with PipelineDecorator.run_locally() I get the legitimate pandas error that I fixed by specifying the freq param in the pd.date_range(.... line in the component:
Launching step [generate_dataset] ClearML results page: [STEP 1/4] Generating dataset from autocut logs... Traceback (most recent call last): File "/tmp/tmp2jgq29nl.py", line 137, in <module> results = generate_dataset(**kwargs) File "/tmp/tmp2jgq29nl.py", line 18, in generate_dataset time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist() File "/home/jean-adrien/.local/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py", line 1128, in date_range dtarr = DatetimeArray._generate_range( File "/home/jean-adrien/.local/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 355, in _generate_range raise ValueError( ValueError: Of the four parameters: start, end, periods, and freq, exactly three must be specified Setting pipeline controller Task as failed (due to failed steps) ! Traceback (most recent call last): File "/home/jean-adrien/Projects/xhr/vinz/v2/clearml/pipelines/retraining/vinz_retraining_pipeline.py", line 236, in <module> executing_pipeline( File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/automation/controller.py", line 3510, in internal_decorator raise triggered_exception File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/automation/controller.py", line 3486, in internal_decorator LazyEvalWrapper.trigger_all_remote_references() File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 361, in trigger_all_remote_references func() File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/automation/controller.py", line 3230, in results_reference raise ValueError( ValueError: Pipeline step "generate_dataset", Task ID=c6e3f272a7e044009d587e2d60e46d65 failed
Whereas the code runs normally as it should be since I fixed the error that caused the ValueError: Of the four parameters: start, end, periods, and freq, exactly three must be specified exception when running using @PipelineDecorator.debug_pipeline()

  
  
Posted 2 years ago

Component's prototype seems fine:
@PipelineDecorator.component( return_values=['dataset_id'], cache=False, task_type=TaskTypes.data_processing, execution_queue='Quad_VCPU_16GB', ) def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):

  
  
Posted 2 years ago

The value of start_date and end_date seems to be None

  
  
Posted 2 years ago

Nope same result after having deleted .clearml

  
  
Posted 2 years ago

print(f"start_date: {start_date} end_date: {end_date}") time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist()

  
  
Posted 2 years ago

So it seems to be an issue with the component parameter called in:
` @PipelineDecorator.pipeline(
name="VINZ Auto-Retrain",
project="VINZ",
version="0.0.1",
pipeline_execution_queue="Quad_VCPU_16GB"
)
def executing_pipeline(start_date, end_date):
print("Starting VINZ Auto-Retrain pipeline...")
print(f"Start date: {start_date}")
print(f"End date: {end_date}")

window_dataset_id = generate_dataset(start_date, end_date)

if name == 'main':
PipelineDecorator.run_locally()

executing_pipeline(
    start_date="2022-01-01",
    end_date="2022-03-02"
) `

Also tried with specifying named parameters like : generate_dataset(start_date=start_date, end_date=end_date) but no effect

  
  
Posted 2 years ago
1K Views
20 Answers
2 years ago
one year ago
Tags
Similar posts