Unanswered
Hi All, I'M Training A Model Using Aws Sagemaker And Monitoring With A Clearml Server On-Prem. Works Well Enough When The Training Is Split (Horovod - With A Task On Each Rank). But When I Try And Spawn Eval Jobs To Run On Different Aws Machines, It Seems
Hi IrateDolphin19 ,
Can you give a bit of a simplistic schema of what you're doing or trying to achieve? Are you using pipelines for this?
172 Views
0
Answers
2 years ago
one year ago