Would gladly try to run it on a remote instance to verify the thesis on some local cache acting up but unfortunately also ran into an issue with the GCP autoscaler https://clearml.slack.com/archives/CTK20V944/p1665664690293529
I have a pipeline with a single component:
` @PipelineDecorator.component(
return_values=['dataset_id'],
cache=True,
task_type=TaskTypes.data_processing,
execution_queue='Quad_VCPU_16GB'
)
def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
"""
Convert autocut logs from a specified time window into usable dataset in generic format.
"""
print('[STEP 1/4] Generating dataset from autocut logs...')
import os
import cv2
import sys
import srsly
import boto3
import shutil
import numpy as np
import pandas as pd
from clearml import Dataset
from zipfile import ZipFile
time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist()
... That I execute from there:
@PipelineDecorator.pipeline(
name="VINZ Auto-Retrain",
project="VINZ",
version="0.0.1"
)
def executing_pipeline(start_date, end_date):
print("Starting VINZ Auto-Retrain pipeline...")
print(f"Start date: {start_date}")
print(f"End date: {end_date}")
window_dataset_id = generate_dataset(start_date, end_date)
if name == 'main':
PipelineDecorator.run_locally()
executing_pipeline(
start_date="2022-01-01",
end_date="2022-03-02"
) `
During my first try I got a legitimate error since the parameter freq
from pd.date_range()
was missing so I fixed it, but on further re-execution the pipeline the backtrace is still returned as if the code was not changed.
But when replaciing the line PipelineDecorator.run_locally()
by PipelineDecorator.debug_pipeline()
the component code works properly.
Thus the main difference of behavior must be coming from the _debug_execute_step_function
property in the Controller
class, currently skimming through it to try to identify a cause, did I provide you enough info btw CostlyOstrich36 ?
CostlyOstrich36 Having the same issue running on a remote worker, even tho the line works correctly on python interpreter and the component run correctly in local debug mode (but not standard local mode):File "/root/.clearml/venvs-builds/3.10/code/generate_dataset.py", line 18, in generate_dataset time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist() File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py", line 1128, in date_range dtarr = DatetimeArray._generate_range( File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 355, in _generate_range raise ValueError( ValueError: Of the four parameters: start, end, periods, and freq, exactly three must be specified
Can you please elaborate on what you're trying to do and what is failing?
The pipeline log indicate the same version of Pandas ( 1.5.0
) is installed, I really don't know what is happening
It's funny cause the line in the backtrace is the correct one so I don't think it has something to do with strange cachine behavior
Didn't have a chance to try and reproduce it, will try soon 🙂
So basically CostlyOstrich36 I feel like debug_pipeline()
use the latest version of my code as it is defined on my filesystem but the run_locally()
used a previous version it cached somehow
CostlyOstrich36 Should I start a new issue since I pinpointed the exact problem given than the beginning of this one was clearly confusing for both of us ?
What happens if you delete ~/.clearml
It's clearml's cache folder
I suppose you cannot reproduct the issue from your side ?
Maybe it has to do that the faulty code was initially defined as a cached component
Ia lready deleted ~/.clearml/cache
but I'll try deleting the entire folder
When running with PipelineDecorator.run_locally()
I get the legitimate pandas error that I fixed by specifying the freq
param in the pd.date_range(....
line in the component:Launching step [generate_dataset] ClearML results page:
[STEP 1/4] Generating dataset from autocut logs... Traceback (most recent call last): File "/tmp/tmp2jgq29nl.py", line 137, in <module> results = generate_dataset(**kwargs) File "/tmp/tmp2jgq29nl.py", line 18, in generate_dataset time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist() File "/home/jean-adrien/.local/lib/python3.10/site-packages/pandas/core/indexes/datetimes.py", line 1128, in date_range dtarr = DatetimeArray._generate_range( File "/home/jean-adrien/.local/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 355, in _generate_range raise ValueError( ValueError: Of the four parameters: start, end, periods, and freq, exactly three must be specified Setting pipeline controller Task as failed (due to failed steps) ! Traceback (most recent call last): File "/home/jean-adrien/Projects/xhr/vinz/v2/clearml/pipelines/retraining/vinz_retraining_pipeline.py", line 236, in <module> executing_pipeline( File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/automation/controller.py", line 3510, in internal_decorator raise triggered_exception File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/automation/controller.py", line 3486, in internal_decorator LazyEvalWrapper.trigger_all_remote_references() File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 361, in trigger_all_remote_references func() File "/home/jean-adrien/.local/lib/python3.10/site-packages/clearml/automation/controller.py", line 3230, in results_reference raise ValueError( ValueError: Pipeline step "generate_dataset", Task ID=c6e3f272a7e044009d587e2d60e46d65 failed
Whereas the code runs normally as it should be since I fixed the error that caused the ValueError: Of the four parameters: start, end, periods, and freq, exactly three must be specified
exception when running using @PipelineDecorator.debug_pipeline()
Component's prototype seems fine:@PipelineDecorator.component( return_values=['dataset_id'], cache=False, task_type=TaskTypes.data_processing, execution_queue='Quad_VCPU_16GB', ) def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
The value of start_date
and end_date
seems to be None
Nope same result after having deleted .clearml
print(f"start_date: {start_date} end_date: {end_date}") time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist()
So it seems to be an issue with the component parameter called in:
` @PipelineDecorator.pipeline(
name="VINZ Auto-Retrain",
project="VINZ",
version="0.0.1",
pipeline_execution_queue="Quad_VCPU_16GB"
)
def executing_pipeline(start_date, end_date):
print("Starting VINZ Auto-Retrain pipeline...")
print(f"Start date: {start_date}")
print(f"End date: {end_date}")
window_dataset_id = generate_dataset(start_date, end_date)
if name == 'main':
PipelineDecorator.run_locally()
executing_pipeline(
start_date="2022-01-01",
end_date="2022-03-02"
) `
Also tried with specifying named parameters like : generate_dataset(start_date=start_date, end_date=end_date)
but no effect