Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
[Pipeline] Hey, Is It Possible To Specify The Output Uri For Pipelines And Their Components Using Pipeline Decorators? I Would Like To Store Pipeline Artifacts And Component Artifacts On S3.

[Pipeline] Hey, is it possible to specify the output uri for Pipelines and their Components using Pipeline decorators? I would like to store Pipeline artifacts and Component artifacts on S3.

Posted 4 months ago
Votes Newest

Answers 7

Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?

Posted 4 months ago

Ahh that’s great, thank you.

And then I could use storage manager or whatever to get the files. Perfect

Posted 4 months ago

It also seems that


is not compatible with caching, sadly,

Both use the exact same mechanism of uploading artifacts (i.e. including caching for downloaded artifacts), in terms of caching pipeline components, this is on a component level (i.e. same code/task same arguments, equals cache hit)
What exactly are you getting ? how is it that the "PipelineDecorator.upload_artifact" uploads to a different storage ? is that reproducible ?

Posted 4 months ago

The return objects were stored to S3 but PipelineDecorator.upload_artifact still uploaded to the file server. Not sure what was up with that but as explained in my next comment it did work when I tried again.

It also seems that PipelineDecorator.upload_artifact is not compatible with caching, sadly, but that is another issue for another thread that I will be starting on Monday.

Have a good weekend

Posted 4 months ago

I have added a lot of detail to this, sorry.

The inline comments in the code talk about that specific script/implementation.

I have added a lot of context in the doc string at the top.

Posted 4 months ago

Hmm. Okay. Thanks

Posted 4 months ago

So the way it works when you run a component the return value with the entire function execution is cached, basically:

this did NOT add the artifact to the pipeline via caching on subsequent runs ❌

you just need to do:

PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url

This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it

image_bucket = gen_random_images()

you can always get the currently executed Task (of any part of the pipeline) with Task.current_task() no need to call "pipe._get_pipeline_task()"

Posted 4 months ago
7 Answers
4 months ago
4 months ago