
Reputation
Badges 1
53 × Eureka!Do you have Ingresses enabled?
there’s a PR coming with example values: https://github.com/allegroai/clearml-helm-charts/pull/234
I ’ m not totally sure atm but you can try to set env var CLEARML_API_HOST_VERIFY_CERT="false"
Hi everyone, I just fixed releases so new charts containing this fix are published. ty!
Just one more info: atm I tested Elastic v7.10.* . I still didn't tested 7.11-7.12-7.13
if you do a kubectl get svc in namspace you should see the svc of api webserver and fileserver
I suggest to try exec into agent pod and try to do some kubectl command like a simple kubectl get pod
this is a clear issue with provisioner not handling the pvc request for any pod having a pvc. It’s not related chart but provisioner you are suing that probably doesn’t support dynamic allocation. what provisioner are you using?
this is a connection fail from agent to apiserver. the flow should be aget-pod -> apiserver svc -> apiserver pod. maybe also apiserver can have something in ogs that can be checked
I wouldn’t say it’s related RBAC because the issue seems Networking realted so connection timed out
This is clearly a network issue; first I’d check there are no restarts of apiserver during that timespan. It’s not easy to debug this since it looks to be random but it can be interesting to check k8s networking configuration overall just to be sure.
you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated
you can try use a specific image like docker.io/arm64v8/mongo:latest
if you have problems with other images I suggest to run docker in emulation mode so you can run amd64 images
ok the issue must be there, After first creation nothing is there
there’s anything specific you need?
additionaConfigs is a section under apiserver
It’s about strategy. If you have ClearML server installed on k8s I guess you want to run task on same k8s cluster. In this case using latest clearml-agent chart is the way to go that uses glue agent uinder the hood. Basically what happens is agent will spin new pod when a new task is enqueued in related queue. At this point it’s k8s duty to have enough resources to spawn the pod and this can be achieved in two ways:
you have enough resources already there you have a k8s autoscaler that can sp...
so you are using docker-compose?
did you tried to create a debug pod with a mount using ceph storageclass? you can start from here https://downey.io/notes/dev/ubuntu-sleep-pod-yaml/ then add the pvc and the mount. then you should exec into the pod and try to write a dummy file on the mount; I suspect the problem is there
in Enterprise we support multiqueueing but it’s a different story
ReassuredArcticwolf33 PR is coming https://github.com/allegroai/clearml-helm-charts/pull/84
you need a dynamic capable provisioner
btw a good practice is to keep infrastructural stuff decoupled from applications. What about using https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner ? After applying that chart you can simply use the generated storage class; wdyt?
I guess yes but honestly I’m not sure you will get the right results
ok but describing the pod you should have, at least, the Ending cause
Hi, not really sure if these is any problem with Github CDN but it looks fine to me right now: https://github.com/allegroai/clearml-helm-charts/issues/155