available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is running. Agent B picks Task B and starts running Task A "talks" to Task BThis is the equivalent of "allocating 2 agents" (basically you have to preserve one and wait for the other to be available).
BTW: Is nvcc multi Node or multi GPU ? (I thought it is a single node multi-gpu)