Reputation
Badges 1
282 × Eureka!Hi, the latest k8sglue-example.py was last commited about 4 months ago. Are you refering to that version?
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.
No issues. I know its hard to track open threads with Slack. I wish there's a plugin for this too. 🙂
That didn't work as well...
Yes it is! But ClearML didn't support multi node training out of the box in a way that it streamline the process. So we are trying to figure out a way to do it.
We are using k8s glue to spawn the job. Would you be able to advise in detail of steps on what goes on when the above code executes?
Thanks. Have a better understanding now.
Oh, this meant i have been using the latest agent which is v1.0.0. The problems were still there.
Hi Jake, thanks for the suggestion, let me try it out.