CooperativeKitten94

0 Questions, 55 Answers

Active since 01 August 2024

Last activity one year ago

Reputation

Answers 55

0 Hi All, I'Ve Successfully Set Up Clearml On Gke Using The Helm Chart And Exposed The Web Server Via An Ingress Controller. I'M Able To Access The Clearml Ui And Run Tasks Across Different Machine Types. Now, I’D Like To Enable And Configure

Hi @<1798162812862730240:profile|PreciousCentipede43> 🙂

Regarding bypassing the IAP I am not sure. Could you elaborate a bit? Do you have some expected solution in mind?
For exposing the interactive sessions you can use a LoadBalancer config as mentioned (if your cloud provider supports its configuration) or use a NodePort service type (making sure there is no firewall rules and you can access the defined ports on the Nodes). Exposing the sessions through an Ingress is supported in t...

7 months ago

0 Hi

@<1734020208089108480:profile|WickedHare16> - please try configuring the cookieDomain

clearml:
  cookieDomain: ""

You should set it as your base domain, example pixis.internal , without any api or files in front of it

one year ago

0 Hi Guys, I Have A Question About Elasticsearch Connection, We Are On Kubernetes Environment (Clearml Is Deployed With Helm Chart), In The Secret We Have This :

Hi @<1752864322440138752:profile|GiddyDragonfly90> - Can you try with the last value you proposed, but use : to separate user and password in the string, like this:

externalServices:
    elasticsearchConnectionString: '[{"scheme":"http","host":"elastic:toto@elasticsearch-es-http","port":9200}]'

one year ago

0 Hi, I Have Deployed Clearml Opensource Chart In My Gcp K8S Cluster, All Seem To Work Well Include The Ingresses Beside The Ingress Of The Api-Server. The Ingress Is Failing On The Health Checks So Its Not Accessible At All. In The Logs I See The Following

Great! 🚀 \

one year ago

0 Anyonw Know How Can I Make Clearml Server Serving File Urls With External Domain, And Not The Internal Kubernetes Cluster Hostnames? Runnin Both The Server And Agents In K8S On-Prem, The Url Giving Is Not Reachable Because It Try To Present It As The Url

@<1726047624538099712:profile|WorriedSwan6> could you please run a kubectl describe pod of the clearml webserver Pod and dump the output here?

6 months ago

0 Hello, I Am First Timer In Clearml And Try To Deploy Locally A Clear Ml Server (Successfully) And Then Agent In My Kubernetes Cluster. I Follow The Helm Chart From "Helm Repo Add Clearml

I understand, I'd just like to make sure if that's the root issue and there's no other bug, and if so then you can think of how to automate it via API

4 months ago

0 Hello! I Would Like To Use My Own Docker Image For A Clearml Task Pod. My Docker Registry Is Aws Ecr With Authentication. For A Regular Deployment, I Use A Secret Added To The Deployment As Follows:

Hey @<1743079861380976640:profile|HighKitten20> - Try to configure this section in the values override file for the Agent helm chart:

# -- Private image registry configuration
imageCredentials:
  # -- Use private authentication mode
  enabled: false
  # -- If this is set, chart will not generate a secret but will use what is defined here
  existingSecret: ""
  # -- Registry name
  registry: docker.io
  # -- Registry username
  username: someone
  # -- Registry password
  password: pwd...

one year ago

0 Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

Hi @<1523708147405950976:profile|AntsyElk37> - There's a few points missing for the PR to be completed, let's follow-up on GitHub. See my comments here None

7 months ago

0 Hi Guys, I Have A Question About Elasticsearch Connection, We Are On Kubernetes Environment (Clearml Is Deployed With Helm Chart), In The Secret We Have This :

@<1752864322440138752:profile|GiddyDragonfly90> - MongoDB is used as a dependency Helm Chart from the Bitnami repo. We are using version 12.1.31 of the chart. See this tag None

In the clearml override values, under the mongodb section you can specify any value that is usable in the original chart 🙂

one year ago

0 Hi

I think Mongo does not like for its db folder to be replaced like this in the running Pod.
You can try by turning off Mongo for a moment (scale it down to 0 replicas from the deployment), then create a one-time Pod (non-mongo, you can use an ubuntu image for example) mounting the same volume that Mongo was mounting, and try using this Pod to copy the db folder in the right place. When it's done, delete this Pod and scale back to 1 the Mongo deployment.

one year ago

0 I'M Using The Clearml Helm Charts, And So Far So Good, Now I'M Looking To Give One Of Our Agents Access To A Private Github Repo, But Could Not Find Where To Configure This ->

Do you mean the Python version that is installed on the clearml agent itself? Or do you mean the Python version available in tasks that will be run from the agent?

one year ago

0 Hi Guys, I Have A Question About Elasticsearch Connection, We Are On Kubernetes Environment (Clearml Is Deployed With Helm Chart), In The Secret We Have This :

@<1752864322440138752:profile|GiddyDragonfly90> - I think you can also add verify_certs: false in the same elasticsearchConnectionString object, have you tried?

one year ago

0 I'M Using The Clearml Helm Charts, And So Far So Good, Now I'M Looking To Give One Of Our Agents Access To A Private Github Repo, But Could Not Find Where To Configure This ->

Sure! I'll talk to the guys to update the documentation 🙂

one year ago

0 Hey, How Can We Control The Pod Of Pipelinecontroller Not Use The Gpu? According To The Documentation, The

Hey @<1726047624538099712:profile|WorriedSwan6> , the basePodTemplate sections configures the default base template for all pods spawned by the Agent.
If you don't want every Task (or Pod) to use the same requests/limits, one thing you could try is to set up multiple queues in the Agent.
Each queue can then have an override of the Pod template.
So, you can try removing the nvidia.com/gpu : "4" from the root basePodTemplate and add a section like this in ...

one year ago

0 Hi Everyone, I'M Deploying Clearml Server Helm Chart With 3 Ingress Controller For Api, Web And Files. I Saw That One Of The Api Backend Is Unhealthy And I Get This Error In Apiserver Pod:

Hi @<1798162812862730240:profile|PreciousCentipede43> 🙂
When you say

one of the api backend is UNHEALTHY

do you mean you have multiple replicas of the apiserver component (i.e. you set the values apiserver.replicaCount > 1) and one of them is not ready?
Could you please share the output of the kubectl describe command for the ClearML apiserver Deployment?

7 months ago

0 Hi

Got it, and are you using external Mongo or Redis?

one year ago

0 Hi Guys, I Have A Question About Elasticsearch Connection, We Are On Kubernetes Environment (Clearml Is Deployed With Helm Chart), In The Secret We Have This :

I see, in the example you provided you used a comma , to separate username and password, I suggest trying to use a column :

one year ago

0 Hi Guys, I Have A Question About Elasticsearch Connection, We Are On Kubernetes Environment (Clearml Is Deployed With Helm Chart), In The Secret We Have This :

🙂 let me know if that works for you

one year ago

0 Hi

Hey @<1734020208089108480:profile|WickedHare16> - Not 100% sure this is the issue, but I noticed a wrong configuration in your values.
You configured both these:

elasticsearch:
  enabled: true


externalServices:
  # -- Existing ElasticSearch connectionstring if elasticsearch.enabled is false (example in values.yaml)
  elasticsearchConnectionString: "[{\"host\":\"es_hostname1\",\"port\":9200},{\"host\":\"es_hostname2\",\"port\":9200},{\"host\":\"es_hostname3\",\"port\":9200}]"

Pl...

one year ago

0 Hi Guys, I Have A Question About Elasticsearch Connection, We Are On Kubernetes Environment (Clearml Is Deployed With Helm Chart), In The Secret We Have This :

If that doesn't work, try removing the auth from the connection string and instead define two extraEnvs for the apiserver :

apiserver:
  extraEnvs:
    - name: CLEARML_ELASTIC_SERVICE_USERNAME
      value: "elastic"
    - name: CLEARML_ELASTIC_SERVICE_PASSWORD
      value: "toto"

one year ago

0 Hello, I Am First Timer In Clearml And Try To Deploy Locally A Clear Ml Server (Successfully) And Then Agent In My Kubernetes Cluster. I Follow The Helm Chart From "Helm Repo Add Clearml

So CLEARML8AGENT9KEY1234567890ABCD is the actual real value you are using?

4 months ago

0 How To Configure Clearml Agent To Keep Pods Around After They Finish/Fail? I Want To Debug A Pod That Crashes, But It Gets Deleted Quickly

Wonderful - We do not have such feature planned for now, feel free to contribute 🙂

7 months ago

0 Hi! I’Ve Checked The Docs For Clearml-Helm Charts, And I’Ve Seen The Possibility To Add Additional Volumes/Volume Mounts. Could You Please Provide An Example Of How To Do It Properly?

Hi @<1523701907598610432:profile|ReassuredArcticwolf33> - Are you referring to the clearml helm chart or to the clearml-agent one?
Either case, the respective values.yaml file is self-documented and contains example. Here I am reporting an example for adding additional volumes and volume mounts to the apiserver component of the clearml chart:

apiserver:
  # -- # Defines extra Kubernetes volumes to be attached to the pod.
  additionalVolumes:
    - name: ramdisk
      empty...

10 months ago

Oh, I see, cause you are using a self-signed certificate, correct?

6 months ago

So, when the UI gets a debug image, it gets the URL for that image, which was created in runtime by the running SDK (by the Agent, in this case), so using the fileserver URL provided by the agent.
You will need to pass the external reference:

agentk8sglue:
  fileServerUrlReference: "

and work around the self-signed cert. You could try mounting your custom certificates to the Agent using volumes and volumeMounts, storing your certificate in a configmap or similarly

6 months ago

0 Hey, How Can We Control The Pod Of Pipelinecontroller Not Use The Gpu? According To The Documentation, The

Hey @<1726047624538099712:profile|WorriedSwan6> - I am sorry, I forgot that the multi-queue feature with templateOverrides is only for the enterprise version.
What you can do, though, is to deploy two different agents in k8s using the helm chart. Simply try installing two different releases, then modify only one of them to have basePodTemplate use the nvidia.com/gpu : "4"
Let me know if this solves your issue 🙂

one year ago

0 Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

Hello @<1523708147405950976:profile|AntsyElk37> 🙂
You are right, the spec.runtimeClassName field is not supported in the Agent at the moment, I'll work on your Pull Request ASAP.
Could you elaborate a bit about why you need Tasks Pods to specify the runtimeclass to use GPUs?
Usually, you'd need to specify a Pod's container with, for example, resources.limits.nvidia.com/gpu : 1 , and the Nvidia Device Plugin would itself assign the correct device to the container. Will that work?

8 months ago

That works as well LOL :))

6 months ago

0 Hello! I Am New To Clearml. I Have Clearml Installation With Helm Chart. I Am Using K8S Agent. I Have Configured A Queue For The Agent. I Have Created A Simple Task With A Simple Python Script. When I Execute This Task In The Agent Log I Get An Error: Err

Hi @<1843461294267568128:profile|KindArcticwolf58> - How did you execute this task?
The k8s_scheduler queue is an internal queue, not intended to be used for enqueuing task. I see you have configured the Agent to watch the gpu queue. Please make sure to create a queue with the same name on the control plane from the UI and restart the Agent, then enqueue the Task on this queue.

5 months ago

Hey @<1736194540286513152:profile|DeliciousSeaturtle82> , yes please try changing the health check to /debug.conf or /debug.ping 🙂

one year ago

Show more results