Reputation
Badges 1
17 × Eureka!And also another question came to my mind. When changing any deployment for clearml like apiserver or mongo or elasticsearch etc. do I have to redeploy everything from the scratch? I had some problems previously when changing something in apiserver forced me to redeploy everything in order for clearml to work properly. And I am wondering whether you have maybe some guidelines for that.
Hey SuccessfulKoala55 Thank you for your answers I really appreciate it. As for elasticsearch it was indeed the index error that was created before. The reason for that is that I was trying to setup a backup for elasticsearch and mongodb using azurefiles. So the scenario is I'm using persistent volumes on k8s that are using azure file shares as storage. Then I can rebuild my cluster and use the exact same storage so that the data is persistent and I can restore my application from the last ...
Hi SuccessfulKoala55 Thanks for the response. For elastic I am using the image http://docker.elastic.co/elasticsearch/elasticsearch:7.6.2 the one that is in manifests in clearml repo. As for the clearml images I am using the latest tags everywhere. Let me restore the vm settings for elastic and I'll let you know ;)
Unfortunately the problem was not resolved nor by changing the vm memory settings back to 2 gb and by going back from azurefiles persistent volumes to hostPath. Seems odd as I did not have any of these issues before. I thought it might come from the changes in PV and elasticsearch settings but going back to the original settings did not resolve the issue. Shouldn't I be using the latest tag for clearml?
Yes. So jupyter lab is run as a tini command inside a pod. I attach a ps aux command from my pod
great! And the container name would be inferenced from the default_output_uri?
Yes they are. With mongo I had a problem connected with azurefiles and mongo who did not approve to mount azurefiles under /data/db as it could not initialize. The solution for that was to mount the azurefiles under different path and then specify command for mongo with path to the data so that it could initialize properly. However when I deleted a kubernetes cluster, created a new one and I redeployed clearml I had issues coming not from mongo anymore but from apiserver that was failing with...
That were my thoughts too. But the jupyter/base_notebook from docker stacks that they recommend to use and from which my image inherits did not include the token in the jupyter lab run command. I don't know whether it was a bug or an intentional choice, however I was either going to change the base image, or to add a token in a postStart hook. I decided to go with the second option 😉
Sry I had to rebuild my image to install curl. Yes the command works.
Hi AgitatedDove14 . I'm just writing to explain what was the problem. Basically our setup - jupyterhub on k8s with kubespawner that was spawning a pod for each single user notebook, uses docker images that are based on jupyter/docker-stacks.
The problem was that the token for jupyterhub api was not propagated to the spawned pod so whenever clearml was trying to access jupyter/user/api/sessions endpoint it would be redirected for authorization to jupyterhub api and then fail due to the lack ...
It works as well . As for rebuilding the image I was not a root nor a sudoer so I had either to rebuild docker image and set it to root or to install the package while rebuilding 😉
Anyway thank you so much for the help it seems like not a problem connected with clearml. I ll post my solution once I solve the problem 😉
Hi AgitatedDove14 . I am using jupyterhub on k8s and I spawn a pod for every singleuser. I have a custom dockerfile with clearml installed however I dont want to copy the clearml.conf file in the dockerfile and instead I would prefer to pass some neccessary configurations as ENV variables. Is it possible?
As for the clearml server version by latest tag I meant v 0.17.0
I'm sorry i was wrong. Niether of the commands give positive response. I actually get 404 page ... Sry I assumed I got a lot of data so it meant it was ok. But now I read into it