Reputation
Badges 1
383 × Eureka!AgitatedDove14 is it possible to get the pipeline task running a step in a step? Is task.parent something that could help?
Yeah, when doing:
task.set_base_docker( "nvidia/cuda:11.4.1-cudnn8-runtime-ubuntu20.04", docker_setup_bash_script=script, )
AgitatedDove14 - mean this - says name=None but text says default is General.
The helm chart installs a agentservice, how is that related if at all?
I have a wrapper SDK over clearml that includes default conf and others are loaded from secret manager / env vars as needed
Gitlab has support for S3 based cache btw.
Yeah please if you can share some general active ones to discuss both algos and engineering side
With the human activity being a step where some manual validations, annotations, feedback might be required
Is Task.current_task() creating a task?
As in run a training experiment, then a test/validation experiment to choose best model etc etc and also have a human validate sample results via annotations all as part of a pipeline
I would prefer controlled behavior than some available version being used. Here triggered a bunch of jobs that all went fine and even evaluations were fine and then when we triggered a inference deploy it failed
Looks like Task.current_task() is indeed None in this case. Bit of log below where I print(Task.current_task()) as first step in the script
Environment setup completed successfully Starting Task Execution: None
AgitatedDove14 either based on scenario
AgitatedDove14 - it does have boto but the clearml-serving installation and code refers to older commit hash and hence the task was not using them - https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving_service.py#L217
It completed after the max_job limit (10)
irrespective of what I actually have installed when running the script?
Ok i did a pip install -r requirements.txt and NOW it picks them up correctly
Thoughts AgitatedDove14 SuccessfulKoala55 ? Some help would be appreciated.
Can you point me at relevant code in ClearML for the autoconnect so that I can understand exactly what's happening
Would this be a good use case to have?
Nope, that doesn’t seem to be it. Will debug a bit more.
Essentially - 1. run a task normally. 2. clone 3. edit to have only those two lines.
Question - since this is a task, why is Task.currnet_task() None?
But that itself is running in a task right?
pipeline code itself is pretty standard
And anyway once a model is published can’t update it right? Which means there will be atleast multiple published models entries of same model over time?
AgitatedDove14 - any pointers on how to run gpu tasks with k8s glue. How to control the queue and differentiate tasks that need cpu vs gpu in this context