The agent is started with this command:clearml-agent --debug daemon --queue gpu --gpus 0 --foreground --docker <gitlab org registry>/project-precog/clearml_config
Ok, it makes sense. But it’s running in docker mode and it is trying to ssh into the host machine and failing
The task log is here:
the log on my local machine is here:
The agent is started from a non-root user if that matters
I can telnet the port from my mac:(base) *[main][~/Documents/plant_age]$ telnet 192.168.1.55 10022 Trying 192.168.1.55... Connected to 192.168.1.55. Escape character is '^]'. SSH-2.0-OpenSSH_8.4p1 Debian-5+deb11u1 ^C
I mean if I enter my host machine ssh password it works. But we will disable password auth in future, so it’s not an option
Btw it seems the docker runs in
network=host
Yes, this is so if you have multiple agents running on the same machine they can find a new open port 🙂
I can telnet the port from my mac:
Okay this seems like it is working
same: Not Found (#404)
May I suggest to DM it to me (so it is not public)
I start clearml-session on my mac this way:clearml-session --queue gpu --docker registry.gitlab.com/periplo-innovation/project-precog/clearml_config
I mean if I enter my host machine ssh password it works. But we will disable password auth in future, so it’s not an option
To clarify, it should not allow users to ssh into the host machine (if you can do that this means you own it), it only allows users to SSH into the container the host machine spins, make sense ?
Btw it seems the docker runs in network=host
But what should I do? It does not work, it says incorrect password as you can see
set the following:CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...
The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible
It does not use key auth, instead sets up some weird password and then fails to auth:
AdventurousButterfly15 it ssh Into the container inside the container it sets new daemon with new random very long password
It will Not ssh to the host machine (i.e. the agent needs to run in docker mode, not venv mode), make sense ?
hmm can you share the log of the Task? (the clearml-session created Task)
Sure, will send in a few min when it executes
AgitatedDove14
made a new one:
https://pastebin.com/LxLFk7py
But what should I do? It does not work, it says incorrect password as you can see
How are you spinning the agent machine ?
Basically 10022 port from the host (agent machine) is routed into the container, but it still needs to be open on the host machine, could it be it is behind a firewall? Are you (client side runnign clearml-session) on the same network as the machien runnign the agent ?
But it’s running in docker mode and it is trying to ssh into the host machine and failing
It is Not sshing to the machine it is sshing directly Into the container.
Notice the port is is sshing to is 10022 which is mapped into the container
All ports are open (both agent machine and client machine are working within same VPN)
Is there a way to check if the port is accessible from my local machine?