Unanswered
Hi All,
I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?
I thought of some sort of gang-scheduling scheme should be implemented on top of the job.
Maybe the agents should somehow go through a barrier with a counter and wait there until enough agents arrived
190 Views
0
Answers
3 years ago
one year ago