Unanswered
Hello, Everyone!
I Have A Question Regarding Clearml Features.
We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs.
Thu
Quick question when you say the HPO Task, you mean the HPO controller logic Task (i.e. the one launching the training jobs), or do you mean the actual training job itself (i.e. running with a specific set of parameters decided by the HPO controlling task) ?
AgitatedDove14 Sorry, my bad! By HPO task
I mean the actual training job itself.
We run the HPO controller logic Task on a separate cpu only machine, so we can think that this task is always on. Only the training jobs can go offline(for the above mentioned reasons)
163 Views
0
Answers
2 years ago
one year ago