Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There, Our Team Started Using Clearml A Few Months Ago And We'Ve Recently Deployed An Aws Eks K8S Cluster With The Hopes Of Deploying A Clearml-Agent. I'Ve Been Able To Install The Agent On The Cluster Using:

Hi there,

Our team started using clearml a few months ago and we've recently deployed an AWS EKS k8s cluster with the hopes of deploying a clearml-agent. I've been able to install the agent on the cluster using:

helm install clearml-agent-gpu allegroai/clearml-agent \
    --set clearml.agentk8sglueKey=<removed> \
    --set clearml.agentk8sglueSecret=<removed> \
    --set agentk8sglue.defaultContainerImage="python:3.11-bullseye" \
    --set agentk8sglue.nodeSelector.nodegroup="clearml-agent" \
    --set agentk8sglue.queue="gpu-queue" \
    --set agentk8sglue.basePodTemplate.nodeSelector.nodegroup="clearml-gpu" \
    --set agentk8sglue.basePodTemplate.env[0].name=CLEARML_AGENT_GIT_USER \
    --set agentk8sglue.basePodTemplate.env[0].value=username \
    --set agentk8sglue.basePodTemplate.env[1].name=CLEARML_AGENT_GIT_PASS \
    --set agentk8sglue.basePodTemplate.env[1].valueFrom.secretKeyRef.name=git-password \
    --set agentk8sglue.basePodTemplate.env[1].valueFrom.secretKeyRef.key=git-password \
    --set clearml.clearmlConfig="
    agent {
      package_manager: {
        type: poetry;
        poetry_version: 1.8.2
      }
      force_git_ssh_protocol: false;
      disable_requirements_auto_install: true
    }"

I'm currently encountering two issues:

  • --set agentk8sglue.defaultContainerImage="python:3.11-bullseye" does not seem to change the container that gets used. Here are the logs from the pod:
Executing task id [9296dbfe38384daf958911e9155a8bca]:
repository = git@gitlab.com:<company_git_repo>.git
branch = <branch_name>
version_num = f0052fa186cab812a4aa07c05e088d466eb41ff7
tag =
docker_cmd = ubuntu:18.04
entry_point = main.py
working_dir = scope-ml/scope_mllib/training

Python executable with version '3.11' requested by the Task, not found in path, using '/usr/bin/python3' (v3.6.9) instead

It still tries to use 'ubuntu:18.04', am I doing this correctly?
2. I've created a k8s secret with the gitlab personal access token, but it seems like it is still unable to git pull the repo that is needed. Here are the logs from the pod:

cloning: git@gitlab.com:<company_git_repo>.git
Using user/pass credentials - replacing ssh url 'git@gitlab.com:<company_git_repo>.git' with https url '
<company_git_repo>.git'

pulling git
Using SSH credentials - replacing https url '
<company_git_repo>.git' with ssh url '
<company_git_repo>.git'
fatal: could not read Username for '
': terminal prompts disabled
error: Could not fetch origin
git pull failed: Command '['git', 'fetch', '--all', '--tags', '--recurse-submodules']' returned non-zero exit status 1.
Repository cloning failed: Command '['git', 'fetch', '--all', '--tags', '--recurse-submodules']' returned non-zero exit status 1.
Task failed: stopping task (4) exception

Any assistance would be much appreciated thanks!!

  
  
Posted 2 months ago
Votes Newest

Answers 5


@<1754676270102220800:profile|AlertReindeer55> hi! Were you able to fix the second issue?

  
  
Posted 20 days ago

@<1523701070390366208:profile|CostlyOstrich36> , my container section is completely empty and unspecified.

The only place I can see "ubuntu:18.04" being specified is in the clearml-agent helm chart defaults ( None ), but the whole point of me runnining --set agentk8sglue.defaultContainerImage="python:3.11-bullseye" is that it's supposed to override that default
image

  
  
Posted 2 months ago

@<1754676270102220800:profile|AlertReindeer55> , I think what @<1523701087100473344:profile|SuccessfulKoala55> means is that you can set the docker image on the experiment level itself as well. If you go into the "EXECUTION" tab of the experiment, in the container section you might see an image there

  
  
Posted 2 months ago

@<1523701087100473344:profile|SuccessfulKoala55> , where/how is this specified, because we are not setting this image anywhere. We are trying to override with a different image "python:3.11-bullseye". If my current way of overriding is incorrect, what is the correct way of doing this?

  
  
Posted 2 months ago

Hi @<1754676270102220800:profile|AlertReindeer55> ,
This:

Executing task id [9296dbfe38384daf958911e9155a8bca]:
repository = git@gitlab.com:<company_git_repo>.git
branch = <branch_name>
version_num = f0052fa186cab812a4aa07c05e088d466eb41ff7
tag =
docker_cmd = ubuntu:18.04
entry_point = main.py
working_dir = scope-ml/scope_mllib/training

Basically says the agent found the ubuntu:18.04 image specified on the task itself , which will always override any default container setting

  
  
Posted 2 months ago
168 Views
5 Answers
2 months ago
20 days ago
Tags