I always have my notebooks in git repo but suddenly it's not running them correctly.
I have a wrapper SDK over clearml that includes default conf and others are loaded from secret manager / env vars as needed
As in if there are jobs, first level is new pods, second level is new nodes in the cluster.
AgitatedDove14 - any pointers on how to run gpu tasks with k8s glue. How to control the queue and differentiate tasks that need cpu vs gpu in this context
Running multiple k8s_daemon rightt? k8s_daemon("1xGPU") and k8s_daemon('cpu') right?
AgitatedDove14 aws autoscaler is not k8s native right? That's sort of the loose point I am coming at.
For different workloads, I need to habe different cluster scaler rules and account for different gpu needs
Got it. Never ran GPU workload in EKS before. Do you have any experience and things to watch out for?
Essentially. Not about removing all WARNINGS, but removing this as it actually works right and the WARNING is wrong.
AgitatedDove14 either based on scenario
` if project_name is None and Task.current_task() is not None:
project_name = Task.current_task().get_project_name()
if project_name is None and not Task.running_locally():
task = Task.init()
project_name = task.get_project_name() `
That makes sense - one part I am confused on is - The Triton engine container hosts all the models right? Do we launch multiple gorups of these in different projects?
I also have a pipelines.yaml which i convert to a pipeline
I was having this confusion as well. Did behavior for execute_remote change that it used to be Draft is Aborted now?
CynicalBee90 - on the platform agnostic aspect - dvc does it with the CLI right? It that what made you give a green checkmark for it?
dataset1 -> process -> dataset2
Example:
name: ml-project template: nbdev pipelines_runner: gitlab pipelines: pipeline-1: steps: - name: "publish-datasets" task_script: "mlproject/publish_datasets.py" - name: "training" task_script: "mlproject/training.py" parents: ["publish-datasets"] - name: "test" task_script: "mlproject/test.py" parents: ["training"]Have cli which goes through each of the tasks and creates them
Any chance you can open a github issue on it?
Will do!
I guess the question is - I want to use services queue for running services, and I want to do it on k8s
yeah i was trying in local and it worked as expected. But in local I was creating a Task first and then seeing if it’s able to get project name from it
Basic question - i am running clearml agent in a ubuntu ec2 machine. Does it use docker by default? I thought it uses docker only if I add the --docker flag?
I don’t want to though. Will run it as part of a pipeline
