would it be on the pipeline task itself then, since that's what's disappearing?
that likely the case
yeah, it just shows what I see in the Console, but then immediately goes back to polling for more work (so... instead of running backtest, it exits, no completion message)
would it be on the pipeline task itself then, since that's what's disappearing?
I will do some experiment comparisons and see if there are package diffs. thanks for the tip.
that's the final screenshot. it just shows a bunch of normal "launching ..." steps, and then stops all the sudden.
the workers connect to the clearml server via ssh-tunnels, so they all talk to "localhost" despite being deployed in different places. each task creates artifacts and metrics that are used downstream
when i run the pipe locally, im using the same connect.sh script as the workers are in order to poll the apiserver via the ssh tunnel.
but maybe here's a clue. after hanging like that for a while... it seems like the agent restarts (the container it runs in does not)
I really can't provide a script that matches exactly (though I do plan to publish something like this soon enough), but here's one that's quite close / similar in style:
None where I tried function-steps out instead, but it's a similar architecture for the pipeline (the point of the example was to show how to do a dynamic pipeline)
enqueuing. pipe.start("default")
but I think it's picking up on my local clearml install instead of what I told it to use.
my tasks have this in them... what's the equivalent for pipeline controllers?
default queue is served with (containerized + custom entrypoint) venv workers (agent services just wasn't working great for me, gave up)
Hi @<1689446563463565312:profile|SmallTurkey79> , when this happens, do you see anything in the API server logs? How is the agent running, on top of K8s or bare metal? Docker mode or venv?
let me downgrade my install of clearml and try again.