Reputation
Badges 1
53 × Eureka!today it's pretty busy for me but I can try to help if needed, pls put any question here if you have and I will try to answer when possible
this is the chart with various group of agents configurable https://artifacthub.io/packages/helm/valeriano-manassero/clearml
SuccessfulKoala55 yes, no autoscaler on that chart. Maybe I'm missing the point but the request was for an "on-premise" setup so I guessed no aws. If I missed the point everything I posted is not useful 😄
Interesting use case, maybe we can create multiple k8s agents for different queues
Hey ApprehensiveSeahorse83 , I didn’t forget about you, it’s just a busy time for me; will answer during the day after a couple of more tests on my testing env.
Exactly, these are system accounts
it would be great to get logs from apiserver and fileserver pods when deleting a file from ui so we can see what is going on. I’m saying this because, at first glance, I don’t see anyissue in your config
h i, if I’m not wrong, mongodb doesn’t have an ARM image (and you are using a Silicon one like me ofc 😄 )
you can try use a specific image like docker.io/arm64v8/mongo:latest
not official but should work
pls fix also fileServerUrlReference: anf webServerUrlReference:
kubectl get svc -n clearml
?
agent is running the command inside the pod like you did execing into pod and manually launching it. If one is returning 127 while manually you are ok it looks to me the command issued is not the same. what is chart version you are using?
the goal is to get healthchecks green so ALB should be able to work
from /
to /debug.ping
look also at the monitoring tab
with the right svc names
Hi,
that's usually related IPV6/IPV4 stack configuration in your k8s cluster. Are you using just one specific stack?
kubectl get svc -n default
?
btw in k8s we abandoned the usage of services since it’s not needed anymore. you can put an agent consuming a queue and enqueue task to it
on OSS it’s usually the only way to as many agent deployments for any queue you define
should be possible to enable ipv6 (even without using it) on network layer to check if this is really the issue?
BoredBluewhale23 I can reproduce the issue, working on it
perfect, now the whole process is clear to me
today I'm in the middle of sprint plannings for my team so I will not be probably fast to help if needed, but feel free to ping me just in case (I will try to do my best)
later in the day I will push also a new clearml chart that will not contain anymore k8s glue since it’s now in clearml-agent chart, this is why I was suggesting to use that chart :)
I suggest to try exec into agent pod and try to do some kubectl command like a simple kubectl get pod
if you have problems with other images I suggest to run docker in emulation mode so you can run amd64 images
this will make autoscaler life easier knowiing exactly how much resources you need