Reputation
Badges 1
53 × Eureka!can you pls share output of helm list
in clearml namespace ?
if yopu instruct apiserver to use s3 fileserver will not basically used anymore (I need SuccessfulKoala55 confirmation to be 100% sure, Im more infra guy :D )
There’s an incomplete PR for this None .
at task completion do you get state Completed in UI?
Hi Tom; lets’ try to debug. Did you install all the charts in same namespace? Did you generate a key/secret pair from UI and the use them just in agent and serving chart?
` ❯ clearml-task --version
ClearML launch - launch any codebase on remote machine running clearml-agent
usage: clearml-task [-h] [--version] [--project PROJECT] --name NAME [--repo REPO] [--branch BRANCH]
[--commit COMMIT] [--folder FOLDER] [--script SCRIPT] [--cwd CWD] [--args [ARGS [ARGS ...]]]
[--queue QUEUE] [--requirements REQUIREMENTS] [--packages [PACKAGES [PACKAGES ...]]]
[--docker DOCKER] [--docker_args DOCKER_ARGS]
...
I think yes, at least this is whatI saw in docs
if you need a not automated way to create the cluster I suggest to take in consideration helm chart only.
I need to investigate, ScrawnyLion96 can you pls open an issue on https://github.com/allegroai/clearml-helm-charts ?
not a big issue but you maybe worth a quick fix
In this case I suggest to give a try to k8s-glue that is there by default in latest chart version
this configuration object is stored as a file in /root/.trains
?
as usual it starts small and after 5 mins discussion is getting challenging 😄 I love this stuff... let me think a bit about it I will get back to you asap on this.
This is clearly a network issue; first I’d check there are no restarts of apiserver during that timespan. It’s not easy to debug this since it looks to be random but it can be interesting to check k8s networking configuration overall just to be sure.
I’m going to investigate (and fix it if possible) in some day
this is a clear issue with provisioner not handling the pvc request for any pod having a pvc. It’s not related chart but provisioner you are suing that probably doesn’t support dynamic allocation. what provisioner are you using?
Hi ApprehensiveSeahorse83 , today we released clearml-agent
chart that just installs glue agent. My suggestion is to disable k8s glue and any other agent from the clearml
chart and install more than one clearml-agent
chart in different namespaces. In this way you will be able to have k8s glue for every queue (cpu and gpu).
uh it would be interesting if you can include it in chart and push a PR :D
k8s cluster can access ubuntu archive?
in some second it should became green