Hi, I Have Run K8S_Glue_Example.Py On My On-Prem K8S, And Have Preconfigured Nodeport Services. I Succeeded To Use Clearml Session To Create Pods But The Ssh Tunneling Failed. It Tried To Connect Clusterip Of The Pod And Port 10020 Instead Of Node Ip And

Answered

Hi, I have run k8s_glue_example.py on my on-prem K8s, and have preconfigured NodePort services. I succeeded to use ClearML Session to create pods but the SSH tunneling failed. It tried to connect ClusterIP of the pod and port 10020 instead of node IP and NodePort. How should I fix it?
Following is my service yml:

kind: Service
apiVersion: v1
metadata:
  name: clearml-agent-1-nodeprot
  namespace: clearml
spec:
  ports:
  - name: clearml-agent-ssh
    port: 10022
    targetPort: 10022
  type: NodePort    
  selector:
    ai.allegro.agent.serial: pod-1

  				
Posted 
	one year ago

					More  		
  Report
		
					PunySquid51
				
					0
					 × 1

Votes Newest

Answers 4

Hi PunySquid51 , clearml-session uses port 10022 by default. you can use the --remote-ssh-port command line option to specify a different port to use

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I also tried to run ssh -P 31919 root@10.190.253.18 , but got error message: ssh: connect to host 0.0.124.175 port 22: Unknown error .

  				
Posted 
	one year ago

					More  		
  Report
		
					PunySquid51
				
					0
					 × 1

Hi SuccessfulKoala55 , Even I run the clearml-session with command line option --remote-ssh-port and --remote-gateway the SSH tunneling still failed.
Following is my complete step:

set k8s service with the following yml:

kind: Service
apiVersion: v1
metadata:
  name: clearml-agent-1-nodeprot
  namespace: clearml
spec:
  ports:
  - name: clearml-agent-ssh
    port: 10022
    targetPort: 10022
    nodePort: 31919
  type: NodePort
  selector:
    ai.allegro.agent.serial: pod-1

run python k8s_glue_example.py --queue gpu-1 --ports-mode --template-yaml gpu-1.yml on k8s node.
run clearml-session --docker nvidia/cuda:11.0.3-runtime-ubuntu20.04 --remote-gateway 10.190.253.18 --remote-ssh-port 31919 on my PC. 10.190.253.18 is the node IP that session pod running.
the clearml-session log on my PC:

Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel to root@10.190.253.18, port 31919

SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel to root@10.190.253.18, port 31919
.......

Could you provide a complete example or tutorial?

  				
Posted 
	one year ago

					More  		
  Report
		
					PunySquid51
				
					0
					 × 1

SuccessfulKoala55 I solved it by other settings, thanks.

  				
Posted 
	one year ago

					More  		
  Report
		
					PunySquid51
				
					0
					 × 1

Write your answer

1K Views

4 Answers

one year ago