Reputation
Badges 1
53 × Eureka!ya sure, I was referring to. create a new PVC just for the test
some suggestions:
start working just with clearml (no agent or serving, these ones will go in after clearml is working) try a fist deploy without any override if it works start adding values to override file (without reporting everything or it will be very difficult to debug, you should not report on override file what is not overridden) do helm upgrade check problems one by one
this one should not be needed for asyncdelete, what is the error you are getting?
(and any queue has it’s only basepodtemplate)
From k8s perspective a pod is ephemeral so if it’s gone for any reason it’s gone. Obviously there are structures that can ensure running state (like Deployments or Statefulsets) so if a pod dies, another one takes place. We didn;t go in this direction because pods are not idempotent so it’s not straightfoward to simply replace them. Btw this looks an interesting topic to me so I’d like to include SuccessfulKoala55 on this also because i’m involved more in infra side of the equation and I ma...
then I enqueue it and it's created but obv empty
It happened to me when trying many installations; can you login using http://app.clearml.home.ai/login url directly ?
Just a quick suggestion since I have some more insight on the situation. Maybe you can look at Velero, it should be able to migrate data. If not you can simply create a new fresh install, scale everything to zero, then create some debug pod mounting old and new pvc and copy data between the two. More complex to say it than do it.
if you do a kubectl get svc in namspace you should see the svc of api webserver and fileserver
uh... yes, i was focusing on pipelinecontroller but it's a task property. Ty, it worked!
ok but describing the pod you should have, at least, the Ending cause
Ofc it’s possible to add this to the chart but, as @<1523701205467926528:profile|AgitatedDove14> said, it’s not recommended to go directly over public internet with it. Regardless of this, @<1556812486840160256:profile|SuccessfulRaven86> do you have any PR to propose for it? It would be great to have something to discuss on in GH.
this sounds weird to me
I wouldn’t say it’s related RBAC because the issue seems Networking realted so connection timed out
you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated
but it;s just a quick guess, not sure if i’m right
yep but this is not how it should work with inpod
just my two cents
try this into the pod
this means network issues at some level
# Point to the internal API server hostname APISERVER=
`
Path to ServiceAccount token
SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
Read this Pod's namespace
NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
Read the ServiceAccount bearer token
TOKEN=$(cat ${SERVICEACCOUNT}/token)
Reference the internal certificate authority (CA)
CACERT=${SERVICEACCOUNT}/ca.crt
Explore the API with TOKEN
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${A...
I guess yes but honestly I’m not sure you will get the right results
probably
is not accessible from your pod
if it turns 503 it’s not network but something on top of it