Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?
Is there a way to group A and B into a sub-pipeline, have the pipeline be queued and executed remotely, but the tasks A and B inside it be treated like local tasks? or something like that?
actually yes, you could have pipeline AB' , that always "executes locally" (meaning not scheduling itself or it's components) , where A, B are the components. from the original pipeline perspective the component is a Task AB (which is this new pipeline). The only caveat is that pipeline AB, tasks A, B need to be on the same git repo