Answered

Hello, I Am First Timer In Clearml And Try To Deploy Locally A Clear Ml Server (Successfully) And Then Agent In My Kubernetes Cluster. I Follow The Helm Chart From "Helm Repo Add Clearml

Hello, I am first timer in ClearML and try to deploy locally a Clear ML server (successfully) and then agent in my Kubernetes cluster. I follow the helm chart from "helm repo add clearml None " and in the helm chart values for agent I changed the below parameters:

agentk8sglueKey: <API KEY>
agentk8sglueSecret: <ACCESS KEY>

-- Reference to Api server url

apiServerUrlReference: " None "

-- Reference to File server url

fileServerUrlReference: " None "

-- Reference to Web server url

webServerUrlReference: " None "

the rest all stay with default values

The pod is running and then goes into restart mode and CrashLoopBack mode

ubuntu@vm4v9lm3:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
clearml-agent-7c6d58c497-xk8hn 0/1 CrashLoopBackOff 9 (3m11s ago) 25m
clearml-apiserver-57d4f9776d-pgn6q 1/1 Running 0 7h58m
clearml-apiserver-asyncdelete-59484594b9-zdm4p 1/1 Running 0 7h58m
clearml-elastic-master-0 1/1 Running 0 7h58m
clearml-fileserver-769d646d7-tzpg6 1/1 Running 0 7h58m
clearml-mongodb-5f995fbb5-mgwbt 1/1 Running 0 7h58m
clearml-redis-master-0 1/1 Running 0 7h58m
clearml-webserver-7df664dcbf-856f9 1/1 Running 0 7h58m
jupyter-notebook-84c6f6fcf9-4lrrv 1/1 Running 0 38m

The logs are below. Any idea what is wrong?
Any other value to update in helm chart for agent?

/root/entrypoint.sh: line 29: /root/clearml.conf: Read-only file system

echo 'api.api_server: None '
/root/entrypoint.sh: line 30: /root/clearml.conf: Read-only file system
echo 'api.web_server: None '
/root/entrypoint.sh: line 31: /root/clearml.conf: Read-only file system
echo 'api.files_server: None '
/root/entrypoint.sh: line 32: /root/clearml.conf: Read-only file system
./provider_entrypoint.sh
source /root/.bashrc
++ '[' -z '' ']'
++ return
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/bin:/root/bin
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/bin:/root/bin
[[ -z '' ]]
python3 k8s_glue_example.py --queue default --namespace default --template-yaml /root/template/template.yaml
/usr/local/lib/python3.6/dist-packages/clearml_agent/_vendor/jwt/utils.py:7: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
Traceback (most recent call last):
File "k8s_glue_example.py", line 8, in <module>
from clearml_agent.glue.k8s import K8sIntegration
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/glue/k8s.py", line 19, in <module>
from clearml_agent.commands.events import Events
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/init.py", line 3, in <module>
from .worker import Worker
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/worker.py", line 47, in <module>
from clearml_agent.commands.base import resolve_names, ServiceCommandSection
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/base.py", line 20, in <module>
from clearml_agent.interface.base import ObjectID
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/interface/init.py", line 7, in <module>
from .base import Parser, base_arguments, add_service, OnlyPluralChoicesHelpFormatter
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/interface/base.py", line 12, in <module>
from clearml_agent.session import Session
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/session.py", line 23, in <module>
from clearml_agent.helper.docker_args import DockerArgsSanitizer, sanitize_urls
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/helper/docker_args.py", line 279, in <module>
class CustomTemplate(Template):
File "/usr/lib/python3.6/string.py", line 74, in init
cls.pattern = _re.compile(pattern, cls.flags | _re.VERBOSE)
File "/usr/lib/python3.6/re.py", line 233, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.6/re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 734, in _parse
flags = _parse_flags(source, state, char)
File "/usr/lib/python3.6/sre_parse.py", line 803, in _parse_flags
raise source.error("bad inline flags: cannot turn on global flag", 1)
sre_constants.error: bad inline flags: cannot turn on global flag at position 92 (line 4, column 20)

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					PompousCrow47
				
					0

Votes Newest

Answers 46

@<1729671499981262848:profile|CooperativeKitten94> @<1857232027015712768:profile|PompousCrow47>

I figured it out for future reference this is a error regarding the Kubernetes Support on the agent : None

As for getting the credentials to lauch the agent the only way i can do it is via UI manually i could not get a way to get them via code

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

with the values on helm

 helm get values clearml-agent -n clearml-prod
USER-SUPPLIED VALUES:
agentk8sglue:
  apiServerUrlReference:


  clearmlcheckCertificate: false
  createQueueIfNotExists: true
  fileServerUrlReference:


  image:
    pullPolicy: Always
    repository: allegroai/clearml-agent-k8s-base
    tag: latest
  queue: default
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi
  webServerUrlReference:


clearml:
  agentk8sglueKey: 8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT
  agentk8sglueSecret: oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU
  clearmlConfig: |-
    api {
        web_server:


        api_server:


        files_server:


        credentials {
            "access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
            "secret_key" = "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
        }
    }
sessions:
  externalIP: 192.168.70.211
  maxServices: 5
  startingPort: 30100
  svcType: NodePort
jcarvalho@kharrinhao:~$

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Please replace those credentials on the Agent and try upgrading the helm release

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

Yes i am using those, they are hardcoded ones cause i will on a later stage generate them via a secure method

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Hi! Im using just a plain Kubernetes cluster (kubeadm) running on Proxmox VM, and im using Argo to deploy the helm, in order to standarize it Let me know if you need any more details!

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

and for dev im not providing

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Hi @<1811208768843681792:profile|BraveGrasshopper38> , following up on your last message, are you running in an OpenShift k8s cluster?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

The value field is a default argo falls back into if i dont provide any

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I had no issues deploying via the Github but helm is quite more confusing

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Also, in order to simplify the installation, can you use a simpler version of your values for now, something like this should work:

agentk8sglue:
  apiServerUrlReference:


  clearmlcheckCertificate: false
  createQueueIfNotExists: true
  fileServerUrlReference:


  queue: default
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi
  webServerUrlReference:


clearml:
  agentk8sglueKey: <NEW_KEY>
  agentk8sglueSecret: <NEW_SECRET>
sessions:
  externalIP: 192.168.70.211
  maxServices: 5
  startingPort: 30100
  svcType: NodePort

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

Python regex error in k8s glue agent :

sre_constants.error: bad inline flags: cannot turn on global flag at position 92

Issue is in clearml-agent k8s glue codebase (Python 3.6 compatibility)
Not configuration-related - persists across different HOCON formats
Affects image tags: 1.24-21 , 1.24-23 , latest

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Cause when i check it references to 3y ago and i am following this: None

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Oh, okay, not sure this will be the only issue but you'll need these credentials to be valid, since they are used by the ClearML Agent to connect to the ClearML Server 🙂
The easiest way to generate credentials is to open the ClearML UI in the browser, login with an Admin user, then navigate to the Settings located on the top right corner when clicking on the user icon. From there go to "Workspace" and click "Create new credentials" and use the value provided

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

Hey! @<1729671499981262848:profile|CooperativeKitten94> Is there any tips you can give me on this?

It seems like the most recent version supported for kubernetes is clearml-agent==1.9.2?

thanks again!

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Oh no worries, I understand 😄
Sure, if you could share the whole values and configs you're using to run both the server and agent that would be useful.
Also what about other Pods from the ClearML server, are there any other crash or similar error referring to a read-only filesystem? Are the server and agent installed on the same K8s node?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

If i run helm get values clearml-agent -n clearml-prod
the output is the following:
USER-SUPPLIED VALUES:
agentk8sglue:
apiServerUrlReference: None
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference: None
image:
pullPolicy: Always
repository: allegroai/clearml-agent-k8s-base
tag: 1.25-1
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
webServerUrlReference: None
clearml:
agentk8sglueKey: CLEARML8AGENT9KEY1234567890ABCD
agentk8sglueSecret: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
clearmlConfig: |-
api {
web_server: None
api_server: None
files_server: None
credentials {
"access_key" = "CLEARML8AGENT9KEY1234567890ABCD"
"secret_key" = "CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456"
}
}
sessions:
externalIP: 192.168.70.211
maxServices: 5
startingPort: 30100
svcType: NodePort

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Show more results

Write your answer

12K Views

46 Answers

3 months ago