I see, thanks! So, correct me if I'm wrong: there is no way for the agent to deal with resource management, it's simply not its job (seems logical). Also, there is no internal scheduler in Trains, I can just create more queues and spin up N agents, each with their own resource set, even in different machines. If that's it, I only see these alternatives:
forget about "abstracting resources" and simply use more queues and/or manually partition stuff among agents Use Trains as experiment tracker, while configuring the two machines with SLURM for instance Setup a k8s cluster and implement your proposed exampleThe last point is a bit fuzzy to me, mostly because I never tinkered with k8s before (and it seems quite overkill to me for a dual-node configuration), so I'm more prone to attempt #2 and fallback to the second one. Do you see other alternatives (i.e. maybe slurm can work with agents)? Thanks again and sorry to bother!