Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
[Pipeline] Hey, Is It Possible To Specify The Output Uri For Pipelines And Their Components Using Pipeline Decorators? I Would Like To Store Pipeline Artifacts And Component Artifacts On S3.

[Pipeline] Hey, is it possible to specify the output uri for Pipelines and their Components using Pipeline decorators? I would like to store Pipeline artifacts and Component artifacts on S3.

  
  
Posted one year ago
Votes Newest

Answers 7


Hmm. Okay. Thanks

  
  
Posted one year ago

Ahh that’s great, thank you.

And then I could use storage manager or whatever to get the files. Perfect

  
  
Posted one year ago

The return objects were stored to S3 but PipelineDecorator.upload_artifact still uploaded to the file server. Not sure what was up with that but as explained in my next comment it did work when I tried again.

It also seems that PipelineDecorator.upload_artifact is not compatible with caching, sadly, but that is another issue for another thread that I will be starting on Monday.

Have a good weekend

  
  
Posted one year ago

I have added a lot of detail to this, sorry.

The inline comments in the code talk about that specific script/implementation.

I have added a lot of context in the doc string at the top.

  
  
Posted one year ago

So the way it works when you run a component the return value with the entire function execution is cached, basically:

this did NOT add the artifact to the pipeline via caching on subsequent runs ❌

you just need to do:

PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url

This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it

image_bucket = gen_random_images()
second_step(image_bucket)

BTW:
you can always get the currently executed Task (of any part of the pipeline) with Task.current_task() no need to call "pipe._get_pipeline_task()"

  
  
Posted one year ago

Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430

  
  
Posted one year ago

It also seems that

PipelineDecorator.upload_artifact

is not compatible with caching, sadly,

Both use the exact same mechanism of uploading artifacts (i.e. including caching for downloaded artifacts), in terms of caching pipeline components, this is on a component level (i.e. same code/task same arguments, equals cache hit)
What exactly are you getting ? how is it that the "PipelineDecorator.upload_artifact" uploads to a different storage ? is that reproducible ?

  
  
Posted one year ago
613 Views
7 Answers
one year ago
one year ago
Tags