Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Clearml-Session Question: I’M Using The Tool With An On-Prem Machine. Normal Tasks Are Being Executed Normally - But When Using

Clearml-session question:
I’m using the tool with an on-prem machine. normal tasks are being executed normally - but when using clearml-session I get error with SSH connection intermittently.
Sometimes it is working fine, but sometimes I get this error message SSH tunneling failed, retrying in 3 seconds
This is the config that I used

 clearml-session --project examples --queue default --jupyter-lab true --vscode-server false \
--remote-gateway <internal-ip> \
--skip-docker-network \
--docker gcr.io/deeplearning-platform-release/pytorch-gpu \
--username <ssh-username> \
--password <ssh-password> \
--verbose  

Clearml-session version 0.4.0

  
  
Posted 3 months ago
Votes Newest

Answers 13


Yes sure - this is what I see in the logs

> Setting up openssh-sftp-server (1:8.2p1-4ubuntu0.5) ...
> Setting up python3-distro (1.4.0-1) ...

Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel
  
  
Posted 3 months ago

The thing is - when I try to connect with normal SSH there are no issues

ssh user@ip 

I’m trying to connect for Mac to Linux @<1523701070390366208:profile|CostlyOstrich36>

Clearml-agent is installed on another machine in the internal network @<1523701205467926528:profile|AgitatedDove14>

  
  
Posted 3 months ago

I mean SSH through the terminal works fine.
The issue is with Clearml-session.

I tried to remove the username/password and remote-host yesterday but it ended up asking me for the password when connecting and not accepting it.

  
  
Posted 3 months ago

@<1523701205467926528:profile|AgitatedDove14> @<1523701070390366208:profile|CostlyOstrich36> Thanks for the help

  
  
Posted 3 months ago

image

  
  
Posted 3 months ago

It cached my SSH parameters and finally after removing all of them it worked

  
  
Posted 3 months ago

2023-02-15 12:49:22,813 - clearml - WARNING - Could not retrieve remote configuration named 'SSH'

This is fine, it means it uses the default identity keys

The thing is - when I try to connect with normal SSH there are no issues

Now I'm lost, so when exactly do you see the issue ?

  
  
Posted 3 months ago

Hmm, any suggestion on making it more visible or on the interface ? (I mean deleting the cache file is always a solution, but it sounded quite painful to debug, hence the question)

  
  
Posted 3 months ago

It seems like the configuration is cached in a way even when you change the CLI parameters.

@<1523704461418041344:profile|EnormousCormorant39> nice!
Yes the configuration is cached so that after you set it once you can just call clearml-session again without all the arguments
What was the actual issue ? Should we add something to the printout?

  
  
Posted 3 months ago

I see now an interesting warning

2023-02-15 12:49:22,813 - clearml - WARNING - Could not retrieve remote configuration named 'SSH'
  
  
Posted 3 months ago

Sometimes it is working fine, but sometimes I get this error message

@<1523704461418041344:profile|EnormousCormorant39> can I assume there is a gateway at --remote-gateway <internal-ip> ?
Could it be that this gateway has some network firewall blocking some of the traffic ?
If this is all local network, why do you need to pass --remote-gateway ?

  
  
Posted 3 months ago

I finally figured out the issue.
It seems like the configuration is cached in a way even when you change the CLI parameters.
After adding explicit JSON with configuration I managed to run it

  
  
Posted 3 months ago

Hi @<1523704461418041344:profile|EnormousCormorant39> , is there any chance this could be indeed network related if it does manage to work sometimes?

Can you add a larger portion of the log with errors?

Also what type of machines are these? Linux to linux?

  
  
Posted 3 months ago
153 Views
13 Answers
3 months ago
3 months ago
Tags