Reputation
Badges 1
38 × Eureka!Thank you for the reply SmugDolphin23
Is there any possible workaround at the moment?
Hi WackyRabbit7 . Take a look at https://clear.ml/docs/latest/docs/references/sdk/task#taskget_task
I believe it describes your use case as example.
With pipelines is even more complicated because what I experienced is that the pod for step 2 was evicted because it was eating too much memory. So the pod has been terminated but the task was not marked as failed / aborted. Because of that, the pipeline controller pod was still running and the pipeline itself was also not marked as aborted / failed.
For a bit more context. Let's say I have 2 experiments in "Project MLOps" called "Exp 1" and "Exp 2". When I publish "Exp 2" I want this trigger to pick up that event and start another task in some other project. But this task would need some information about "Exp 2" like it's name, id or maybe config object etc.
Does the trigger pass any context to the task which will be executed?
Now for example the pod was killed because I had to replace the node. The task is stuck in "Running". Aborting from the UI says "experiment aborted successfully" but the state does not change.
I also experience that if a worker pod running a task is terminated, clearml does not fail/abort the task.
I am trying to run with scale from zero k8s nodes for maximum cost savings. So a node should only be online if clearml actually runs a task. Waiting for the 2 hours timeout when running on expensive gpu instances for example is quite wasteful because the pipeline controller pod will keep the node online.
This is what I tried and it does not work because plot
is no longer a data frame object, it is now a styler
. The error comes from the fact that logger.report_table
wants do to fillna
on the data frame object. I can't seem to find a way to have the hyperlinks embedded on the data frame object. Any suggestions?
CostlyOstrich36 it works like this
AgitatedDove14
I do believe triggers should be unique somehow because I find them way too easy to mishandle. Especially if used with schedule_function
which is defined in the same script. Updating that function requires deleting the existing trigger task first and recreating it. If not done like this you just end up with 2 trigger tasks with the same name which I assume will respond to the same event(s) but do something slightly different in response. I assume it might work like this...
Hello CostlyOstrich36 I solved it by using a .sh script locally when I want to create/update the trigger. The sh script will chain 2 py scripts together. The first py script will take care of deleting the existing running trigger task and the second py script will be the one that will recreate the trigger task with the updated code.
It just seems strange to me that you could have 2 triggers that do different things but using the same name. Nothing that can't be worked around but for automa...
What I would like to be able to do is basically get rid of the ".pipelines" project that gets created automatically
TimelyMouse69 The pipeline task(s) end up in a sub project called ".pipelines" no matter how I configure the PipelineController project name and target project. This .pipelines project is not visible from the "PROJECTS" section of the UI. You can only get to it from the PIPELINES view by clicking on "Full details" on a step.
Please see attached images
Ah I did not think to look for that option in the user's settings. That should do it. Thank you for the help 🙂
But the pre_execute_callback
from the pipe.add_function_step
needs to be fixed, it does run before the task is executed but the Node does not have any attributes set besides the name.
No problem SmugDolphin23 and thank you. I am really quite stuck with this 😄
Thank you SmugDolphin23 I'll try it out.
If I right click on the initial pipeline Draft and hit "Run" from there, the new run wizard is populated with the default parameters value and uses "set_default_execution_queue" as the queue under "Advanced configuration".
That would match what add_dataset_trigger
and add_model_trigger
already have so it would be good
JuicyFox94 since I have you, the connection issue might be caused by the istio proxy. In order to disable the istio sidecar injection I must add an annotation to the pod.
https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearml-agent/templates/agentk8sglue-configmap.yaml#L8
Unfortunately there does not seem to be any field for that in the values file.
SuccessfulKoala55 So this is the intended behavior? To always have to select the queue from "Advanced configuration" on the pipeline run window even though the "set_default_execution_queue" is set to the "default" queue?
Besides the fact that tasks will always have "k8s_scheduler" as the queue in the info tab so looking back at a task you will not be able to tell to which queue it was assigned.
Alright. I will keep it in mind. Thank you for the confirmation 🙂
actually it does not because the pods logs show .
yes that is possible but I do use istio for the clearml server components. I can move the agents to a separate namespace. I will try that
So it seems it starts on the queue I specify and then it gets moved to the k8s_scheduler queue.
So the experiment starts with the status "Running" and then once moved to the k8s_scheduler queue it stays in "Pending"