ps, the agent is in docker mode, I wonder why it uses the host mapping for the clearml cache folder
Okay that was because it wasn’t on docker mode for this reproduction
used Nvidia pytorch container 22.04 instead of the default one, tried to put also jupyterlab (opened up the default ports on azure console). task seems successful, sill no ssh tunnel.
GrittyStarfish67 , which version of ClearML & ClearML-Session are you using?
sure. debug + docker mode and then start a clearml session?
GrittyStarfish67 , can you please try running the agent with --debug?
CostlyOstrich36 I ran using the deafult docker, still a tunell problem. this is what I got eventually:
` Creating config file /etc/ssh/sshd_config with new version
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:TTE+YCJmi2NOpH/ykzdHiP+MgCfKkZXocwUyu58GuAA root@Merlin-dev (RSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:ks6yr6FpKp5pyLU9NRLK/K96BYieuivwqw7RKAaQHIA root@Merlin-dev (ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:0JxVsacRc3A5zbfyrZNka5GpSPAGHKWQ7Q76JXPVIsQ root@Merlin-dev (ED25519)
Created symlink /etc/systemd/system/sshd.service → /lib/systemd/system/ssh.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ssh.service → /lib/systemd/system/ssh.service.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up python3-requests (2.18.4-2ubuntu0.1) ...
Setting up ssh-import-id (5.7-0ubuntu1.1) ...
Processing triggers for systemd (237-3ubuntu10.53) ...
Processing triggers for dbus (1.12.2-1ubuntu1.3) ...
Processing triggers for libc-bin (2.27-3ubuntu1.5) ...
W: Download is performed unsandboxed as root as file '/var/cache/apt/archives/partial/libcap2_1%3a2.25-1.2_amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
SSH Server running on Merlin-dev [10.2.0.11] port 10022
LOGIN u:root p:38c96d79d908c64fae672743bba2cbf50ecd47ccb41a3b7394b55759166cdfec `
CostlyOstrich36 seeing an awful lot
DEBUG:urllib3.connectionpool:Resetting dropped connection: api.clear.ml
AgitatedDove14 CostlyOstrich36 yes! that did the trick. I added the 10022 on the azure networking pane and session is now working!!
I'm assuming you cannot directly access port 10022 (default ssh port on the remote machine) from your local machine, hence the connection issue. Could that be?