Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the suggestion!
Yes, I did manually run the same code on the worker node (e.g., using python3 llm_deployment.py
), and it successfully utilized the GPU as expected.
What I’m observing is that when I deploy the workload directly on the worker node like that, everything works fine — the task picks up the GPU, logs stream back properly, and execution behaves normally.
However, when I submit the same code using clearml-task
from the control node (which schedules it to the same GPU-enabled worker), the task starts and even detects the GPU (e.g., sees cuda:0
), but doesn’t actually utilize it.
Let me know if I might be missing something in the configuration. Really appreciate the help!