Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Answered

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling:
Starting SSH tunnel Warning: Permanently added '[10.xx.xx.xx]:xxxx' (ED25519) to the list of known hosts. root@10.xx.xx.xx: Permission denied (publickey).How can I solve that? (I am using latest version of clearml-session)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 30

Hi JitteryCoyote63 , can I assume you can ssh into the machine directly?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Yes!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Well not really

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Well not really

Please elaborate 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

ssh my-instance @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ED25519 key sent by the remote host is SHA256:O2++ST5lAGVoredT1hqlAyTowgNwlnNRJrwE8cbMLo0. Please contact your system administrator. Add correct host key in /Users/H4dr1en/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /Users/H4dr1en/.ssh/known_hosts:81 Host key for 10.105.1.77 has changed and you have requested strict checking. Host key verification failed.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

But after that you're connected to the machine and can work on it?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

So this message appears when I try to ssh directly into the instance

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

After I started clearml-session

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

If I don’t start clearml-session , I can easily connect to the agent, so clearml-session is doing something that messes up the ssh config and prevent me from ssh into the agent afterwards

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

So I cannot ssh anymore to the agent after starting clearml-session on it

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

That’s why I said “not really” 😄

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

CostlyOstrich36 How is clearml-session setting the ssh config?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

I'm not sure, will check 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

JitteryCoyote63 this is standard ssh authorized server removal
https://superuser.com/a/30089
specifically you can try:
ssh-keygen -R 10.105.1.77

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 I see https://github.com/allegroai/clearml-session/blob/main/clearml_session/interactive_session_task.py#L21= that a key pair is hardcoded in the repo. Is it being used to ssh to the instance?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Is it being used to ssh to the instance?

It is used for the SSH client so it "knows" the SSH server (does that make sense) ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Yes with the command you shared I can now ssh again manually to the agent, but I still clearml-agent will raise the same error

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

but I still clearml-agent will raise the same error

which one?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

sorry, the clearml-session. The error is the one I shared at the beginning of this thread

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

JitteryCoyote63 are you running the agent in docker mode ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

This is the reason you are getting an error 🙂
Basically the session asks the agent to setup a new SSH server with credentials on the remote machine, this is not an issue inside a container, as this is an isolated environment, but when running in venv mode the User running the agent is not root, hence it cannot spin/configure an SSH server.
Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I understand, but then why the docker mode is an option of the CLI if we always have to use it so that it works?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion

This is a good point! I'll make sure we stress it (BTW: it will work with elevated credentials, but probably not recommended)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

(BTW: it will work with elevated credentials, but probably not recommended)

What does that mean? Not sure to understand

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Does the agent install the nvidia-container toolkit, so that GPUs of the instance can be accessed from inside the docker running jupyterlab?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

This is the prerequisites of the docker service installed on the host machine (where the agent is running)
Basically follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://docs.docker.com/compose/gpu-support/

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 https://clear.ml/docs/latest/docs/apps/clearml_session/#running-in-docker in the docs there is a --docker option, that’s what confuses me, since the agent should always run in docker mode

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Yes, the agent's mode is global, i.e. all tasks are either inside docker or in venv. In theory you can have two agents on the same machine one venv one docker listening to two diff queues

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

3K Views

30 Answers

3 years ago

2 years ago