Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M Trying To Use K8S-Glue Agent, To Do So, I'Ve Followed The Next Steps:

I'm trying to use K8s-glue agent, to do so, I've followed the next steps:

  • Created NS clearml
  • Created secret from template:
apiVersion: v1
kind: Secret
metadata:
  name: k8s-glue-pod-template
stringData:
  pod_template.yml: |
    apiVersion: v1
    metadata:
      namespace: clearml
    spec:
      containers:
        - resources:
            limits:
              cpu:             1000m
              memory:          4G
            requests:
              cpu:             1000m
              memory:          4G
      restartPolicy: Never
  1. created service account which allow control clearml NS:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: clearml-service-account
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-manager-role
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-manager-rolebinding
subjects:
  - kind: ServiceAccount
    name: clearml-service-account
    namespace: clearml
roleRef:
  kind: Role
  name: pod-manager-role
  apiGroup: rbac.authorization.k8s.io
  1. install the pod:
apiVersion: v1
kind: Pod
metadata:
  name: k8s-glue
spec:
  serviceAccountName: "clearml-service-account"
  containers:
    - name: k8s-glue-container
      image: allegroai/clearml-agent-k8s:base-1.21
      imagePullPolicy: Always
      command: [
        "/bin/bash",
        "-c",
        "source /root/.bashrc && /root/entrypoint.sh"
      ]
      volumeMounts:
        - name: pod-template
          mountPath: /root/template
      env:
        - name: CLEARML_API_HOST
          value: "
" #Custom-port!
        - name: CLEARML_WEB_HOST
          value: "
"
        - name: CLEARML_FILES_HOST
          value: "
"
        #        - name: K8S_GLUE_MAX_PODS
        #          value: "2"
        - name: K8S_GLUE_QUEUE
          value: "k8s-glue"
        - name: K8S_GLUE_EXTRA_ARGS
          value: "--template-yaml /root/template/pod_template.yml"
        - name: CLEARML_API_ACCESS_KEY
          value: "***"
        - name: CLEARML_API_SECRET_KEY
          value: "***"
        - name: CLEARML_WORKER_ID
          value: "k8s-glue-agent"
        - name: CLEARML_AGENT_UPDATE_REPO
          value: ""
        - name: FORCE_CLEARML_AGENT_REPO
          value: ""
        - name: CLEARML_DOCKER_IMAGE
          value: "ubuntu:22.04"
  volumes:
    - name: pod-template
      secret:
        secretName: k8s-glue-pod-template

After pushing first experiment I'm getting this error:

Ex: Expecting value: line 1 column 1 (char 0)
Failed deleting completed/failed pods for ns clearml: Command '['bash', '-c', 'kubectl delete pod -l=CLEARML=agent-74b23a8f --namespace=clearml --field-selector=status.phase!=Pending,status.phase!=Running --output name']' returned non-zero exit status 127.

Even after dequeue the experiment and keep the queue clean, the error keep looping.
What can be done here?
In the ui nothing that can help, the console show:

task a9e29945e78c43b28a9d8d1fcb2f088f pulled from 4a6c8de54dbe4fb0ae7f979611637a01 by worker k8s-glue-agent
  
  
Posted 9 months ago
Votes Newest

Answers 5


I suggest to exec into the pod and issue the command kubectl delete pod -l=CLEARML=agent-74b23a8f --namespace=clearml --field-selector=status.phase!=Pending,status.phase!=Running --output name
sp you can see the ouput from inside the pod. This should help understand what is going on with the command

  
  
Posted 9 months ago

from the pod it's working fine.

  
  
Posted 9 months ago

agent is running the command inside the pod like you did execing into pod and manually launching it. If one is returning 127 while manually you are ok it looks to me the command issued is not the same. what is chart version you are using?

  
  
Posted 9 months ago

I did't use chart, I've used this example: None

But I tried use the build-resources and build my own image, with kubectl installed by me, and it worked 🙂

  
  
Posted 9 months ago

I think the kubectl version need to fit the cluster, or higher

  
  
Posted 9 months ago
590 Views
5 Answers
9 months ago
9 months ago
Tags