Reputation
Badges 1
383 × Eureka!I am seeing that it still picks up nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
Ok code suggests so. Looking for more powerful pipeline scheduling like on datasets publish, actions on model publish etc
Thanks AlertBlackbird30
Is there some doc or relevant code on exactly what's happening? The behaviour has been random
Is it not possible to say just look at my requirements.txt file and the imports in the script?
Cool, didn't know it was disabled. This exact reason was why I created a wrapper over ClearML for my use so that people don't ever accidentally talk to demo server
Here’s an example error I get trying it out on one of the example models:Error: Requested Model project=ClearML Examples name=autokeras imdb example with scalars tags=None not found. 'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
Ok so that’s nothing more than what I would configured in the clearml config then
Thoughts AgitatedDove14 SuccessfulKoala55 ? Some help would be appreciated.
I am going to be experimenting a bit as well, will get back on this topic in a couple of weeks 🙂
The description says this though
A section name associated with the connected object. Default: 'General'
I always have my notebooks in git repo but suddenly it's not running them correctly.
I have a wrapper SDK over clearml that includes default conf and others are loaded from secret manager / env vars as needed
This is the command that is running:
` ['docker', 'run', '-t', '-e', 'NVIDIA_VISIBLE_DEVICES=none', '-e', 'CLEARML_WORKER_ID=clearml-services:service:c606029d77784c69a30edfdf4ba291a5', '-e', 'CLEARML_DOCKER_IMAGE=', '-v', '/tmp/.clearml_agent.72r6h9pl.cfg:/root/clearml.conf', '-v', '/root/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/root/.clearml/pip-cache:/root/.cache/pip', '-v', '/root/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/root/.clearml/cache:/clea...
As in if there are jobs, first level is new pods, second level is new nodes in the cluster.
AgitatedDove14 - any pointers on how to run gpu tasks with k8s glue. How to control the queue and differentiate tasks that need cpu vs gpu in this context
Running multiple k8s_daemon rightt? k8s_daemon("1xGPU") and k8s_daemon('cpu') right?
AgitatedDove14 aws autoscaler is not k8s native right? That's sort of the loose point I am coming at.
For different workloads, I need to habe different cluster scaler rules and account for different gpu needs
Got it. Never ran GPU workload in EKS before. Do you have any experience and things to watch out for?
Essentially. Not about removing all WARNINGS, but removing this as it actually works right and the WARNING is wrong.
AgitatedDove14 either based on scenario
` if project_name is None and Task.current_task() is not None:
project_name = Task.current_task().get_project_name()
if project_name is None and not Task.running_locally():
task = Task.init()
project_name = task.get_project_name() `
That makes sense - one part I am confused on is - The Triton engine container hosts all the models right? Do we launch multiple gorups of these in different projects?
I also have a pipelines.yaml which i convert to a pipeline