Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Trying To Debug

Hi, trying to debug clearml-session on an azure instance. Cannot seem to connect to it.

HELLO FELLOW AZURE USER FROM THE FUTURE: A SOLUTION WAS FOUND: CREATE TCP RULE FOR PORT 10022 INBOUND

Right now I’ve boiled it down to the basics to try and find out what’s going on.

My command is
clearml-session --vscode-server false --jupyter-lab false --queue default --public-ip
tried with and without public-ip, stuck in permanent:
SSH tunneling failed, retrying in 3 seconds Starting SSH tunnelI think the problem is in the initial setup of the session task. Here is the output to the DevOps task console:
ClearML results page: `
2022-06-12 08:16:12,547 - clearml - WARNING - Could not retrieve remote configuration named 'SSH'
Using default configuration: {'ssh_host_ecdsa_key': '-----BEGIN EC PRIVATE KEY-----*\n-----END EC PRIVATE KEY-----\n', 'ssh_host_ed25519_key': '-----BEGIN OPENSSH PRIVATE KEY-----**\n-----END RSA PRIVATE KEY-----\n', 'ssh_host_rsa_key__pub': 'ssh-rsa ****', 'ssh_host_ecdsa_key__pub': 'ecdsa-sha2-nistp256 ****', 'ssh_host_ed25519_key__pub': None}
Applying vault configuration
Installing SSH Server on Merlin-dev [10.2.0.11]
Unable to load host key "/datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub": invalid format
Unable to load host key: /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub
Unable to load host key "/datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub": invalid format
Unable to load host key: /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub
Unable to load host key: /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ed25519_key.pub

SSH Server running on Merlin-dev [10.2.0.11] port 10022

LOGIN u:root p:None

no jupyterlab to monitor - going to sleep This is the sshd_config created PermitRootLogin yes
ClientAliveInterval 10
ClientAliveCountMax 20
AllowTcpForwarding yes
UsePAM yes
AuthorizedKeysFile /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/authorized_keys
PidFile /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/sshd.pid
AcceptEnv CLEARML_API_ACCESS_KEY CLEARML_API_SECRET_KEY CLEARML_API_ACCESS_KEY CLEARML_API_SECRET_KEY
HostKey /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key
HostKey /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ed25519_key
HostKey /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key
HostKey /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_rsa_key.pub
HostKey /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ecdsa_key.pub
HostKey /datadrive/clearml_cache/venvs-builds/3.8/code/.clearml_session_sshd/ssh_host_ed25519_key.pub `

  
  
Posted one year ago
Votes Newest

Answers 14


GrittyStarfish67 , which version of ClearML & ClearML-Session are you using?

  
  
Posted one year ago

Following

  
  
Posted one year ago

CostlyOstrich36 I ran using the deafult docker, still a tunell problem. this is what I got eventually:
` Creating config file /etc/ssh/sshd_config with new version
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:TTE+YCJmi2NOpH/ykzdHiP+MgCfKkZXocwUyu58GuAA root@Merlin-dev (RSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:ks6yr6FpKp5pyLU9NRLK/K96BYieuivwqw7RKAaQHIA root@Merlin-dev (ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:0JxVsacRc3A5zbfyrZNka5GpSPAGHKWQ7Q76JXPVIsQ root@Merlin-dev (ED25519)
Created symlink /etc/systemd/system/sshd.service → /lib/systemd/system/ssh.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ssh.service → /lib/systemd/system/ssh.service.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up python3-requests (2.18.4-2ubuntu0.1) ...
Setting up ssh-import-id (5.7-0ubuntu1.1) ...
Processing triggers for systemd (237-3ubuntu10.53) ...
Processing triggers for dbus (1.12.2-1ubuntu1.3) ...
Processing triggers for libc-bin (2.27-3ubuntu1.5) ...
W: Download is performed unsandboxed as root as file '/var/cache/apt/archives/partial/libcap2_1%3a2.25-1.2_amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)

SSH Server running on Merlin-dev [10.2.0.11] port 10022

LOGIN u:root p:38c96d79d908c64fae672743bba2cbf50ecd47ccb41a3b7394b55759166cdfec `

  
  
Posted one year ago

ps, the agent is in docker mode, I wonder why it uses the host mapping for the clearml cache folder
Okay that was because it wasn’t on docker mode for this reproduction

  
  
Posted one year ago

GrittyStarfish67 , can you please try running the agent with --debug?

  
  
Posted one year ago

CostlyOstrich36 seeing an awful lot
DEBUG:urllib3.connectionpool:Resetting dropped connection: api.clear.ml

  
  
Posted one year ago

sure. debug + docker mode and then start a clearml session?

  
  
Posted one year ago

I'm assuming you cannot directly access port 10022 (default ssh port on the remote machine) from your local machine, hence the connection issue. Could that be?

  
  
Posted one year ago

that’s all

  
  
Posted one year ago

Hurray!

  
  
Posted one year ago

used Nvidia pytorch container 22.04 instead of the default one, tried to put also jupyterlab (opened up the default ports on azure console). task seems successful, sill no ssh tunnel.

  
  
Posted one year ago

Yes please 🙂

  
  
Posted one year ago

AgitatedDove14 CostlyOstrich36 yes! that did the trick. I added the 10022 on the azure networking pane and session is now working!!

  
  
Posted one year ago

clearml==1.4.1 clearml-session==0.3.6

  
  
Posted one year ago