its odd... I really dont see tasks except the controller one dying
Hi @<1689446563463565312:profile|SmallTurkey79> !Prior runs of this pipeline worked just fine What SDK version were you using for the prior runs? Does this still happen if you revert to that version?
Can you provide a script that imitates what you are doing?
In the pipeline you are running, are you creating new tasks/pipelines/datasets?
did you take a look at my connect.sh script? I dont think it's a problem since only the controller task is the problem.
Is there some sort of culling procedure that kills tasks by any chance? the lack of logs makes me think it's something like that.
I can also try different agent versions.
do you have any STATUS REASON under the INFO section of the controller task?
yeah, it just shows what I see in the Console, but then immediately goes back to polling for more work (so... instead of running backtest, it exits, no completion message)
default queue is served with (containerized + custom entrypoint) venv workers (agent services just wasn't working great for me, gave up)
worker thinks its in venv mode but is containerized .
apiserver is docker compose stack
ill check logs next time i see it .
currently rushing to ship a model out, so I've just been running smaller experiments slowly hoping to avoid the situation . fingers crossed .
Hi @<1689446563463565312:profile|SmallTurkey79> , when this happens, do you see anything in the API server logs? How is the agent running, on top of K8s or bare metal? Docker mode or venv?
odd bc I thought I was controlling this... maybe I'm wrong and the env is mis-set.

it happens consistently with this one task that really should be all cache.
I disabled cache in the final step and it seems to run now.
