Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I’Ve Recently Started Experimenting With Clearml And The Various Features It Offers. I’M Primarily Working On Creating Different Pipelines, And I’Ve Encountered An Issue I’D Appreciate Your Help With. I’Ve Noticed That, Somewhat Inconsistently, The In

Hi,
I’ve recently started experimenting with ClearML and the various features it offers. I’m primarily working on creating different pipelines, and I’ve encountered an issue I’d appreciate your help with.
I’ve noticed that, somewhat inconsistently, the inputs to a pipeline step, which are passed from previous steps, are occasionally received as None . The pipeline steps are defined with the retry_on_failure=3 parameter, so after several automatic retry attempts, the inputs are eventually valid (though sometimes they still remain None after the maximum retries). For example, in a pipeline where the first step creates a dataset that is passed to the next step, occasionally the next step starts running with the dataset being None (though this issue occurs with other input types as well, not just Dataset ).
Additionally, sometimes the input is not None , but accessing the data within it results in an error. For example:

  File "/tmp/tmpg0ykiwm2.py", line 22, in my_step
    if (clearml_dataset.name == 'debug_dataset'):
  File "/usr/local/lib/python3.8/dist-packages/clearml/datasets/dataset.py", line 345, in name
    return self._task.get_project_name().partition("/.datasets/")[-1]
AttributeError: 'NoneType' object has no attribute 'partition'

I’m using add_function_step for my pipeline definition.
Any help or insights would be greatly appreciated.
Thanks in advance!

  
  
Posted 5 months ago
Votes Newest

Answers 7


@<1523701070390366208:profile|CostlyOstrich36> Any suggestions?

I encounter this unstable behavior also with other scenarios. For example, with a function step which just tries to get an existing dataset. for the line:

dataset = Dataset.get(dataset_project='test_pipelines', dataset_name=dataset_name, only_published=True)

I sometimes get:

Traceback (most recent call last):
  File "/tmp/tmpnndxzv32.py", line 66, in get_dataset
    dataset = Dataset.get(dataset_project='test_pipelines', dataset_name=dataset_name, only_published=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/clearml/datasets/dataset.py", line 1782, in get
    raise ValueError(
ValueError: Could not find Dataset project/name/version ('test_pipelines', 'test', None)

But in some runs it finds the dataset.

  
  
Posted 4 months ago

Hi @<1815919815257231360:profile|UpsetFrog68> , what if you move the _create_dataset function into create_dataset , does the issue still reproduce? Also, can you try setting the parents explicitly?

  
  
Posted 4 months ago

Hi @<1815919815257231360:profile|UpsetFrog68> , can you provide a standalone code snippet that would reproduce this occasional behaviour?

  
  
Posted 5 months ago

Are you using a self hosted server or app.clear.ml ?

  
  
Posted 5 months ago

self hosted server

  
  
Posted 5 months ago

Hi @<1523701070390366208:profile|CostlyOstrich36> ,
I've created a standalone code of a simple pipeline, demonstrating the issue.
The pipeline contains 3 steps: create_dataset, use_dataset, use_dataset2.

I ran the same script for 3 times, and got different results:

  • Error in the third step for 2 times, third time succeeded.
  • Errors for all retries - pipeline failed.
  • Success.
  
  
Posted 5 months ago

Does anyone have any ideas on how to solve the above issues?

  
  
Posted 4 months ago
537 Views
7 Answers
5 months ago
4 months ago
Tags