Update: I see that by default it uses 10022 as the remote SSH port, so I’ve opened that as well (still getting the “tunneling failed” message though).
I’ve also noticed this log in the agent machine:2021-07-09 05:38:37,766 - clearml - WARNING - Could not retrieve remote configuration named 'SSH' Using default configuration: {'ssh_host_ecdsa_key': '-----BEGIN EC PRIVATE KEY-----\{private key here}
Hi QuaintPelican38
Assuming you have open the default SSH port 10022 on the ec2 instance (and assuming the AWS premissions are set so that you can access it). You need to use the --public-ip
flag when running the clearml-session. Otherwise it "thinks" it is running on a local network and it registers itself with the local IP. With the flag on it gets the public IP of the machine, then the clearml-session running on your machine can connect to it.
Make sense ?
Totally! Thanks so much AgitatedDove14 , I’ll try that out now
Unfortunately no dice 😕 I’ve opened every port from 0-11000, and am using the command clearml-session --public-ip true
on the client, but still getting the timeout message, only now it says:
` Setting up connection to remote session
Starting SSH tunnel
Warning: Permanently added '[<IP address>]:10022' (ECDSA) to the list of known hosts.
SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel
Warning: Permanently added '[<IP address>]:10022' (ECDSA) to the list of known hosts. `And it repeats that second paragraph every 3 seconds
Hi QuaintPelican38 can you manually access the machine based on the IP it registered
(Look under the DevOps project, you'll see a running Task "interactive session" under the configuration tab, user properties you should find the IP
Oh that’s cool, I assumed the DevOps project was just examples!
There’s a jupyter_url
property there that is http://{instance's_private_ip_address}:8888?token={jupyter_token}
There’s alsoexternal_address {instance_public_ip_address} internal_ssh_port 10022 internal_stable_ssh_port 10023 jupyter_port 8888 jupyter_token {jupyter_toke} vscode_port 9000
Maybe this is something stupid to do with VPCs that I should understand better!
Hi QuaintPelican38
Can you ssh to {instance_public_ip_address}:10022 (something like ssh -p 10022 user@IP_HERE
)?
Basically just getting the password prompt means you are okay.
I suspect that you have some AWS security definition (firewall) that prevents a direct access to the instance, could that be?
Hi AgitatedDove14 thanks for your help and sorry I missed this! I’ve had this on hold for the last few days, but I’m going to try firing up a new ClearML server running Version 1.02 (I’ve been using the slightly older Allegro Trains image from the AWS marketplace) and have another try from there. Thanks for your help on Github too ❤ I’m so blown away by the quality of everything you folks are doing, have been championing it hard at my workplace