
Reputation
Badges 1
53 × Eureka!I have multiple agents not sharing /root/.trains
Today I’m OOO but I. An give an initial suggestion: when dealing with resource usage issues logs are important but metrics can help a lot more. If you don’t have it, install a Grafana stack so we can see resource metric history before we got oom . This helps to understand if we are really using a lot of RAM ore the problem is somewhere else.
today I'm in the middle of sprint plannings for my team so I will not be probably fast to help if needed, but feel free to ping me just in case (I will try to do my best)
I ’ m not totally sure atm but you can try to set env var CLEARML_API_HOST_VERIFY_CERT="false"
iptables is used by docker itself so you need to be careful on doing mods: https://docs.docker.com/network/packet-filtering-firewalls/
with that said I’d start trying to work on localhost just to focus on the real problem and then I would move outside
it’s alongside health checks tab
did you tried to create a debug pod with a mount using ceph storageclass? you can start from here https://downey.io/notes/dev/ubuntu-sleep-pod-yaml/ then add the pvc and the mount. then you should exec into the pod and try to write a dummy file on the mount; I suspect the problem is there
Do you have Ingresses enabled?
probably you will see it’s not capable of doing it and it should be related k8s config
# Point to the internal API server hostname APISERVER=
`
Path to ServiceAccount token
SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
Read this Pod's namespace
NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
Read the ServiceAccount bearer token
TOKEN=$(cat ${SERVICEACCOUNT}/token)
Reference the internal certificate authority (CA)
CACERT=${SERVICEACCOUNT}/ca.crt
Explore the API with TOKEN
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${A...
if mounts are already there everywhere you can also mount directly on the nodes on a specific folder then use rancher local path provisioner
if you already have data over there you may import it
ya sure, I was referring to. create a new PVC just for the test
Yes, one interesting info would be: what dynamic storage provisioner are you using? (storageclass)
If you have ALB you will just need to add some annotations on ingress rules depending on your setup. btw for now , since you already have everything in place, I suggest to just add values to /etc/hosts
and see if it works
in some second it should became green
Hi @<1523701717097517056:profile|ScantMoth28> , disabling atm we are not supporting Istio but I’m more than willing to look at a proposal like yours. Let’s discuss this on a new issue on github pls so we can keep track of it and find a good way to implement. thanks
other wise yes, if this is not an option, you can also mount what is already existing so pls open an issue in new repo helm chart and we can find a solution
yep but this is not how it should work with inpod
btw a good practice is to keep infrastructural stuff decoupled from applications. What about using https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner ? After applying that chart you can simply use the generated storage class; wdyt?
I guess the message may be mistaken. Pls share kubectl get svc of the namespace you installed clearml
I don’t think it’s related how agent talk with apiserver or fileserver. It’s more related the fact agent pod internal kubectl cannot contact kubernetes apiserver
but it;s just a quick guess, not sure if i’m right
you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated