It also seems that
PipelineDecorator.upload_artifact
is not compatible with caching, sadly,
Both use the exact same mechanism of uploading artifacts (i.e. including caching for downloaded artifacts), in terms of caching pipeline components, this is on a component level (i.e. same code/task same arguments, equals cache hit)
What exactly are you getting ? how is it that the "PipelineDecorator.upload_artifact" uploads to a different storage ? is that reproducible ?
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430
The return objects were stored to S3 but PipelineDecorator.upload_artifact
still uploaded to the file server. Not sure what was up with that but as explained in my next comment it did work when I tried again.
It also seems that PipelineDecorator.upload_artifact
is not compatible with caching, sadly, but that is another issue for another thread that I will be starting on Monday.
Have a good weekend
I have added a lot of detail to this, sorry.
The inline comments in the code talk about that specific script/implementation.
I have added a lot of context in the doc string at the top.
Ahh that’s great, thank you.
And then I could use storage manager or whatever to get the files. Perfect
So the way it works when you run a component the return value with the entire function execution is cached, basically:
this did NOT add the artifact to the pipeline via caching on subsequent runs ❌
you just need to do:
PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url
This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it
image_bucket = gen_random_images()
second_step(image_bucket)
BTW:
you can always get the currently executed Task (of any part of the pipeline) with Task.current_task() no need to call "pipe._get_pipeline_task()"