Reputation
Badges 1
51 × Eureka!I hope I didn't miss anything
well, it's only when adding a - name
to the template
You need to mount it toΒ
~/clearml.conf
Β (i.e. /root/clearml.conf)
I have overridden that with the env var
it woudl be the same with a docker container and -v
then π np
check if you have any more of those recovery reports in the mongo log, it should report progress
I think I have sent you all the existing logs
Now I suspect what happened is it stayed on another node, and your k8s never took care of that
that's an interesting theory
Thank you for helping! π
I'll do that
yea, it's working π
so if the node went down and then some other node came up, the data is lost
I will investigate a bit more and then check if I can recover
that would be a great solution
You can either use the StrictHostKeyChecking=no
or generate a known_hosts file. I don't know about other options
That's a cool idea. Then you pass the tolerations definition through a different pod template?
AgitatedDove14 any chance you are familiar with this error: https://github.com/allegroai/clearml-agent/issues/55 ?
thank you for your time and support, I appreciate it!
Seems like this is not the best solution:
why not?
but the PV seems to be just a path to the labeled node
AgitatedDove14 so basically I am using my own docker image with all of our internal dependencies already installed, including python packages. Some of these packages (e.g green-common
) should be loaded automatically from the dockerimage.
I thought that maybe the virtualenv should https://stackoverflow.com/questions/12079607/make-virtualenv-inherit-specific-packages-from-your-global-site-packages --system-site-packages
https://stackoverflow.com/questions/12079607/make-virtualenv-...
oh cool, didn't know about this one
Should I make a new issue or just reply on the one I mentioned above?