Hello!
I’m using ClearML on a Kubernetes cluster and have encountered a strange behavior when training a model from a non-main (master) branch. In my code ( train.py
+ Hydra), I use task.set_script
to specify the “repository” and “branch,” with the branch being a separate experimental branch. Everything was working fine until I decided to change the versions of some packages, and the training started failing.
In the training logs, I can see that packages with the new versions specified in setup.cfg
are installed first, but then older packages from the setup.cfg
file of the master branch begin to install. Additionally, when I connected to the Kubernetes pod, I saw that the code under /root/.clearml/vcs-cache/myrepo.git.bab58651b8533039258495e21cb16e0f/myrepo.git/
was also on the master branch (see attached screenshot).
Could you please advise what might be causing this issue? I’ve tried clearing the pip cache, but it didn’t help. Perhaps I’ve missed something?
I’d appreciate any guidance!
Thanks in advance!