Hi FloppyDeer99
What is the meaning of no real scheduling
I think the meaning is that from the moment a k8s job is created, the k8s is in charge of actually spinning the container. Since k8s has no real priority/order the scheduling order is not guaranteed form this point.
The idea of the cleaml-k8s -glue is that the glue will launch a job on the k8s cluster only if it is sure there are enough resources to actually spin the job now (as opposed to, sometime in the future), this means the priority and order are kept on the cleaml queue, as the glue will not pop a job unless it will be executed momentarily.
Make sense ?
Since k8s has no real priority/order the scheduling order is not guaranteed form this point.
K8s can schedule pod with different priorities. So maybe no real scheduling means there is no ClearML scheduling after applying pod to k8s.
only if it is sure there are enough resources to actually spin the job now
In 1.0.2 version, I do not find any logic about checking k8s resource in k8s.py. Does it will implement in the future?
Hi FloppyDeer99 ,
I'm not sure what you mean by "real" - using the k8s glue you can control the pod/job template and using that enforce node selectors etc. What do you mean by docker image verification?All in all, it all seems like stuff that can be easily added and/or modified
What is meaning of ‘the scheduling is done before that’? Does it mean the ClearML will schedule the task for me?
README introduces that no real scheduling when using Kubernetes. But I think no matter launch a Pod or Job that the Pod will be scheduled through k8s scheduler exception specifying the nodeName in Pod specification. This confuses me a lot. What is the meaning of no real scheduling I don’t understand no verification of docker image
K8s can schedule pod with different priorities.
I'm not sure I agree here, could you refer me to the docs on this ability in k8s ?
So maybe no real scheduling means there is no ClearML scheduling after applying pod to k8s.
That is correct 🙂
Does it will implement in the future?
Yes, this is enterprise feature, in the community you can specify --max-pods limit (which will cause it never to pull a job if it hits the max-pod limit)
Regarding #1, there is not scheduling once execution reaches the k8s cluster, but the scheduling is done before that, using the glue.
I think that in the paid tier there is support for dynamic GPU scheduling that allows you to specify allocation requirements on the task.
And no real scheduling in README exactly means that ClearML will not do scheduling for task which may provided in paid tier? In other words, Kubernetes will schedule the Pod for me. Is it right?