Reputation
Badges 1
14 × Eureka!(My specific use case is wanting to interact with a debugger)
Reviving this: do you recall what fixed this, or has anyone else run into this issue? I'm constantly getting this in my pipelines. If I run the exact same pipeline code / configuration multiple times, it will eventually run without a User aborted: stopping task (3)
, but it's unclear what is happening the times when it fails.
I guess what I'm confused about is that the final resolved environment is different between the first manual execution and the reproduced one -- the first runs perfectly fine, the second crashes and fails to make the environment.
I am wondering if there is a way for me to connect to a currently worker process and interact with anything in the script with expects user input. For example, if I submitted a task that had this as its script:
# ... other stuff
import code
code.interact()
would there be any way for me to connect and actually use the interactive Python session it drops into?
After looking through the docs again, I don't know how I missed the ClearML Session page, that seems like exactly what I want.
Yup, there was an agent listening to the services queue, it picked up the pipeline job and started to execute it. It just seems frozen at the place where it should be spinning up the tasks within the pipeline
Woo, what a doozy. Thanks for the debug @<1523701205467926528:profile|AgitatedDove14> ! Will move forward with your suggestions.
Yup! Have two queues: services
with one worker spun up in --services-mode
, and another queue (say foo
) that has a bunch of GPU workers on them. When I start the pipeline locally, jobs get sent off to foo
and executed exactly how I'd expect. If I keep everything exactly the same, and just change pipeline.start_locally()
-> pipeline.start()
, the pipeline task itself is picked up by the worker in the services
queue, sets up the venv correctly, prints ` St...
Yup, code/git reference is there. Will private message you the log
Hi AgitatedDove14 , thanks for the response!
I'm a bit confused between the distinction / how to use these appropriately -- Task.init
does not have repo
/ branch
args to set what code the task should be running. Ideally, when I run the pipeline I run the current branch of whoever is launching the pipeline which I can do with Task.create
. It also seems like Task.init
will still make new tasks if artifacts are recorded?
My ideal is that I do exactly what ` Task.c...
But it is a bit confusing that the docs suggest accessing node.job.task
even though node.job
is being set to None
I guess I'm just a bit confused by what the correct mental model is here. If I'm interpreting this correctly, I need to have essentially "template tasks" in my Experiments section whose sole purpose is to be copied for use in the Pipeline? When I'm setting up my Pipeline, I can't go "here are some brand new tasks, please run them", I have to go "please run existing task A with these modifications, then task B with these modifications, then task C with these modifications?" And when the pipeli...
Oooo I didn't notice the base_task_factory
argument before, that seems exactly like what I'd want. I will try that now! Thank you.
I think the docstring is just a bit confusing since it seems to directly recommend accessing node.job.task
to access/modify things. I believe I have found a workaround for my specific case though by using pipeline.get_processed_nodes()
to grab some relevant info from the previously completed step.