Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Discovered An Issue With Clearml-Session Where We Have The Agents Running Within A Tailscale Network. When The Clearml Session Is Local On The Same Physical Network, Connections Work Fine. But When We Are On The Virtual Network, They Dont Work Fine

discovered an issue with clearml-session where we have the agents running within a tailscale network.

When the clearml session is local on the same physical network, connections work fine. But when we are on the virtual network, they dont work fine

  
  
Posted one year ago
Votes Newest

Answers 14


Hi @<1535069219354316800:profile|PerplexedRaccoon19> can you please elaborate on the issue?

  
  
Posted one year ago

This is the issue

Setting up connection to remote session
Starting SSH tunnel to root@192.168.1.185, port 10022
SSH tunneling failed, retrying in 3 seconds
  
  
Posted one year ago

so the 192.xxxx network is the physical network, and not on the tailscale network

  
  
Posted one year ago

And where is the agent running?

  
  
Posted one year ago

@<1535069219354316800:profile|PerplexedRaccoon19> the clearml-session uses the ip published on the task by the code running as part of the session task to connect to the session - this is basically an issue of the IP visible from within the container where the session code is running remotely

  
  
Posted one year ago

I think I'm running into the same issue? Using the webapp and the AWS Autoscaler app. Everything gets started up properly (can be seen in instance/experiment logs) but SSH fails, seemingly timing out. Tried with coreml-session --public-ip True also, same issue (though the IP that it's attempting to connect to evidently changes)

  
  
Posted one year ago

Would love to know if there's a fix, it's currently blocking my use (jupyter notebooks hosted on EC2)

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Could you elaborate? I believe both the ips are visible to the container.

This is making things slightly complicated because now I have to introduce a jumphost for people who aren’t on the same physical network and are on the same tail scale network

  
  
Posted one year ago

@<1636537836679204864:profile|RipeOstrich93> , can you make sure that the Additional ClearML Configuration for the autoscaler app includes agent.extra_docker_arguments: ["--ipc=host", ] ?

  
  
Posted 11 months ago

@<1535069219354316800:profile|PerplexedRaccoon19> can you verify the container uses the same docker arg as specified in the previous message?

  
  
Posted 11 months ago

In the end I forked the clearml-session library and removed mechanisms to access the interactive terminal. I added ipc=host.

There's one identifiable issue with clearml-session+tailscale though - while it does launch the daemon properly, it registers the wrong ip address to the task (sometimes the external ip address even when --external is not passed). At the end of the day, if we know which machine it was launched on, we're able to replace that ip address with a tailscale equivalent and still connect. When ipc=host is active, we're able to query the network interfaces, and if there is a tailscale (typically tailscale0 ) network interface, we can query it to get the ip address of that and register it with the task. This could possibly be exposed as an arg in the cli as something like clearml-session --docker .... --tailscale

I'm happy to work on a PR if you are interested

  
  
Posted 11 months ago

I actually ran into the exact same problem. The agents aren't hosted on AWS though, just a in-house server.

  
  
Posted 10 months ago

want to work on it together?

  
  
Posted 10 months ago

Sure. I'm in Europe but we can also test things async.

  
  
Posted 10 months ago
762 Views
14 Answers
one year ago
10 months ago
Tags