Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

Hey there, Is it possible for a clearml pipeline step to log a folder instead of numpy/pickle objects? Looking at the docs, monitor_artifacts could be what I am searching for but I am not sure

  
  
Posted 2 years ago
Votes Newest

Answers 13


So in which scenario do you want to keep those folders as artifacts and where would you like to store them?

  
  
Posted 2 years ago

CostlyOstrich36 super thanks for confirming! I have then the follow-up question: are the artifacts duplicated (copied)? or just referenced?

  
  
Posted 2 years ago

Yes exactly

  
  
Posted 2 years ago

I think it depends on your code and the pipeline setup. You can also cache steps - avoiding the entire need to worry about artifacts.

  
  
Posted 2 years ago

In all the steps I want to store them as artifacts to s3 because it’s very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps

  
  
Posted 2 years ago

Do you mean if they are shared between steps or if each step creates a duplicate?

  
  
Posted 2 years ago

So in my use case each step would create a folder (potentially big) and would store it as an artifact. The last step should “merge” all the pervious folders. The idea is to split the work among multiple machines (in parallel). I would like to avoid that these potentially big folder artifacts are also stored in the pipeline task, because this one will be running on the services queue in the clearml-server instance, that will definitely not have enough space to handle all of them

  
  
Posted 2 years ago

So if all artifacts are logged in the pipeline controller task, I need the last task to access all the artifacts from the pipeline task. I need to execute something like PipelineController.get_artifact() in the last step task

  
  
Posted 2 years ago

I also would like to avoid any copy of these artifacts on s3 (to avoid double costs, since some folders might be big)

  
  
Posted 2 years ago

I guess I can have a workaround by passing the pipeline controller task id to the last step, so that the last step can download all the artifacts from the controller task.

  
  
Posted 2 years ago

JitteryCoyote63 , heya, yes it is :)
You can save the entire folder as an artifact.

  
  
Posted 2 years ago

What's the use case?

  
  
Posted 2 years ago
1K Views
13 Answers
2 years ago
one year ago
Tags