Reputation
Badges 1
383 × Eureka!Yeah was planning to use nested projects for that
AgitatedDove14 - any pointers on how to run gpu tasks with k8s glue. How to control the queue and differentiate tasks that need cpu vs gpu in this context
Got it. Never ran GPU workload in EKS before. Do you have any experience and things to watch out for?
AgitatedDove14 aws autoscaler is not k8s native right? That's sort of the loose point I am coming at.
Which would also mean that the system knows which datasets are used in which pipelines etc
Running multiple k8s_daemon rightt? k8s_daemon("1xGPU")
and k8s_daemon('cpu')
right?
OK i found what’s happening:
I had an additional Task.init()
- just the blank one, to get the project name. Adding the disable to that as well fixed the issue
Do we support GPUs in a) docker mode b) k8s glue?
pipeline code itself is pretty standard
In this case, particularly because of pickle protocol version between 3.7 and 3.8
As in if there are jobs, first level is new pods, second level is new nodes in the cluster.
that or in clearml.conf or both
My question is - I have this in a notebook now. How can i make it such that any update to the upstream database triggers this data transformation step
Not able to understand what’s really happening in the links
I am doing something like this with a yaml based pipelines DSL
Will try it out. Pretty impressed 🙂
BTW when I started using s3, I was thinking I needed to specify ouput_uri for each task. Soon realized that you just need the prefix where you want to put it into, and clearml will take care of project etc being appended to the path. So for most usecases, a single output uri set in conf should work.
I use a custom helm chart and terraform helm provider for these things
AgitatedDove14 - does having this template work for updating hte base image:
` spec:
containers:
- image: nvidia/cuda:11.4.1-cudnn8-runtime-ubuntu20.04 `
I am seeing that it still picks up nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
Thanks for the fast responses as usual AgitatedDove14 🙂
AgitatedDove14 - it does have boto but the clearml-serving installation and code refers to older commit hash and hence the task was not using them - https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving_service.py#L217
create_task_from_function
I was looking at options to implement this just today, as part of the same remote debugging that I was talking of in this thread
Any chance you can open a github issue on it?
Will do!