Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

Hi i saw this on the clearml-agent docs but other than the docker image, i'm not sure how to integrate this with clearml py and clearml-server. Please advise.

Two K8s integration flavours
Spin ClearML-Agent as a long-lasting service pod `` use clearml-agent docker image map docker socket into the pod (soon replaced by podman) allow the clearml-agent to manage sibling dockers benefits: full use of the ClearML scheduling, no need to worry about wrong container images / lost pods etc. downside: Sibling containers

  
  
Posted 3 years ago
Votes Newest

Answers 23


Can you run the entire thing on your own machine (just making sure it doesn't give this odd error) ?

  
  
Posted 3 years ago

Actually it hasn't changed ...

  
  
Posted 3 years ago

Hi AgitatedDove14 , i've got the same error. It would appear that the code references clearml_agent/helper/base.py which i believe is part of clearml-agent v0.17.1. Could that be the issue?

  
  
Posted 3 years ago

SubstantialElk6 I just executed it , and everything seems okay on my machine.
Could you pull the latest clearml-agent from the github and try again ?

EDIT:
just try to run:
git clone cd clearml-agent python examples/k8s_glue_example.py

  
  
Posted 3 years ago

python k8s_glue_example.py --queue gpu --namespace default 
Traceback (most recent call last):
 File "k8s_glue_example.py", line 86, in <module>
  main()
 File "k8s_glue_example.py", line 80, in main
  namespace=args.namespace,
 File "/home/administrator/clearml-agent-k8s/venv/lib/python3.6/site-packages/clearml_agent/helper/base.py", line 239, in _ call _
  cls. instances[cls] = super(Singleton, cls). call_(*args, **kwargs)
TypeError: _ init _() got an unexpected keyword argument 'base_pod_num'

  
  
Posted 3 years ago

TypeError: 

init

() got an unexpected keyword argument 'base_pod_num'

Could you post the entire log?

  
  
Posted 3 years ago

For example:
examples/k8s_glue_example.py --queue k8s_gpu - --namespace pod-clearml-conf ~/trains.conf --template-yaml example/base.yml

  
  
Posted 3 years ago

the default for base_pod_num is 1.

  
  
Posted 3 years ago

python k8s_glue_example.py --helpTo get all the commands for configurations
You should probably pass a few :)

  
  
Posted 3 years ago

first line to make sure kubectl is connected to k8s.

  
  
Posted 3 years ago

This is probably the whole script.

kubectl get nodes
pip install clearml-agent
python k8s_glue_example.py

  
  
Posted 3 years ago

SubstantialElk6 whats the command line you are using ?

  
  
Posted 3 years ago

So i kept trying, but i'm stuck on this when i run  python k8s_glue_example.py
TypeError: init () got an unexpected keyword argument 'base_pod_num'

Reply…

  
  
Posted 3 years ago

Hi SubstantialElk6
Yes this is the queue the glue will pull jobs from and push into the k8s. You can create a new queue from the UI (go to the workers&queues page and to the Queue Tab and press on "create new" Ignore it 🙂 this is if you are using config maps and need TCP routing to your pods As you noted this is basically all the arguments you need to pass for (2). Ignore them for the time being This is the k8s overrides to use if launching the k8s job with kubectl (basically --overrides) If passed instead of calling kubectl run, you provide a k8s template for kubectl apply

The doc also mentioned 

preconfigured services with selectors in the form of

"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.

Unless you need TCP routing to the pods you can ignore this part

  
  
Posted 3 years ago

The doc also mentioned preconfigured services with selectors in the form of
"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022. Would you have any examples of how to do this?

  
  
Posted 3 years ago

Hi, i tried the k8s-glue on my k8s setup and needed some clarifications on some of the arguments.
--queue. Does this only refer to default and service? How can i create new queue to which it can sync with the ClearML server? --ports-mode. I'm not sure what ports mode does. doc says "add a label to the pod which can be used as service". Which pod is it referring to in the first place? All args pertaining to --ports-mode. (E.g. base-pod-num, gateway-address...etc) --overrides-yaml. What is the default yaml? --template-yaml. Do you have a sample of this?

  
  
Posted 3 years ago

Ok, that seems clearer, thanks.

  
  
Posted 3 years ago

Hi SubstantialElk6
No need for that, you can use the helm chart (or spin them once with kubctl) then they take care of scheduling by themselves.
You can also use the k8s glue (basically spinning kubernetes pods automatically for you, based on the Tasks that you push into the ClearML queue)
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py

In short, two possible deployments
Static k8s pod running the agent (then the agent runs all the experiments inside the pod or as sibling pod) Dynamic where the k8s-glue pulls Tasks from the ClearML queue, creates a k8s job and sends the k8s job (notice the job itself is the clearml-agent running the specific Task for us, including cloning the code, python packages, arguments etc.

  
  
Posted 3 years ago

Hi, so this means if i want to use Kubernetes, i would have to 'manually' install multiple agents on all the worker nodes?

  
  
Posted 3 years ago

SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment based on the "installed packages" section in the Task, this section is fully requirements.txt compatible. If "installed packages" is empty empty it will revert to "requirements.txt from the repo itself Docker mode, where the agent will spin a docker (see Task Execution Tab, base docker image) then inside the docker it will clone the repository and install the packages based on "Installed packages" section (just like in the venv mode)Make sense ?

  
  
Posted 3 years ago

I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.

  
  
Posted 3 years ago

Are you asking regrading the k8s integration ?
(This is not a must, you can run the clearml-agent bare-metal on any OS)

  
  
Posted 3 years ago

Hi SubstantialElk6
I'm not sure what you are asking 🙂
Basically the clearml-agent will pull a Task from an execution queue, and execute it (based on the definition on the Task, i.e. git repo, python packages docker image etc.)

  
  
Posted 3 years ago
534 Views
23 Answers
3 years ago
one year ago
Tags