I’d definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too
That makes sense, any chance you can open a github issue with feature request so that we do not forget ?
The current implementation will upload the result of the first component, and then the first thing the next component will do is download it.
If they are on the same machine, it should be cached when accessed the 2nd time
Wouldn’t it be more performant for the first component to store its result to the local cache along uploading it to file server? In that way, the next component if run on the same node wouldn’t need to download it from the file server.
I think you are correct since the first time, it will not pass through the cache...
Not sure if there is an easy "path" to tell the cache "put this file in the cache"...