Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Again, How Can I Use The

Hello again, how can i use the clearm-session package ? I don't seem to find any helpful docs or resources:
If i have a machine with IP=certain_IP and i have ssh access to this machine how can i use clearml-session to launch JupyterLab / VSCode on a this remote machine ?
I tried to run clearml-session on my local machine and on the target machine and it takes forever :

  
  
Posted 3 years ago
Votes Newest

Answers 17


AgitatedDove14 Hi, for remote machine, I'm switching to Ubuntu server + docker + NVIDIA GPU, instead of using Windows. I run the clearml-agent with docker on the Ubuntu server.

Now everything looks fine on the server after I started the clearml-session on my laptop, which means SSH/VSCode/Jupyter servers are created and I got the URLs.

However, on my laptop it is showing error:
Remote machine is ready Setting up connection to remote session Starting SSH tunnel ssh: connect to host 172.17.0.2 port 10022: No route to hostAny idea how to resolve it? I don't know why I'm getting 172.17.0.2 while the IP of my remote machine is 10.19.20.15 . Is 172.17.0.2 the internal IP only accessible from the running docker? If so, how to expose the IP to my laptop?

  
  
Posted 3 years ago

AgitatedTurtle16 from the screenshot, it seems the Task is stuck in the queue. which means there is no agent running to actual run the interactive session.

Basic setup:
A machine running clearml-agent (this is the "remote machine") A machine running cleaml-session (let's call it laptop 🙂 )You need to first start the agent on the "remote machine" (basically call clearml-agent daemon --docker --queue default ), Once the agent is running on the remote machine, from your laptop run cleaml-session select the default queue (the one the rermote machine is listening to), and wait until you get the http links.

  
  
Posted 3 years ago

AgitatedDove14 Hi, thanks for the response.

I tried to change the IP address as indicated above, but now clearml-session is showing the error:
ssh: connect to host 10.19.20.15 port 10022: Connection refused
Info to help you reproduce FYI:
clearml-session : version 0.3.2 Ubuntu: version 20.04.2 LTS docker specified for the interactive session: nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 command-line used to spin the agent: clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04

  
  
Posted 3 years ago

Our remote machine is Windows 10

JumpyDragonfly13 seems like the Windows 10 + docker is the issue (that would explain the OCI error)
Is this relevant ?
https://github.com/microsoft/WSL/issues/5100

  
  
Posted 3 years ago

Hi AgitatedDove14

I tried the commands you suggested. The first command works fine, but the second command failed with the following message:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Our remote machine is Windows 10 running docker (with WSL2), which seems not supporting NVIDIA GPU yet? Is it the reason that makes the 2nd command failed?

We'd like to try the case without docker. Please advise how to do it, thanks!

  
  
Posted 3 years ago

AgitatedDove14 Yes I have an agent running. Otherwise, it would keep running at "Waiting for remote machine allocation . [Status]"

I do not know how to check the TCP connection?

BTW, I just tried the command clearml-session again, and now it would stop with error "docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].". It means I could not use a remote machine without GPU?

  
  
Posted 3 years ago

I actually read that documentation but more specifically i need an example on how to do it if possible. As i mentioned itried to run clearml-session but it takes forever and nothing happen

  
  
Posted 3 years ago

Hi JumpyDragonfly13 , just making sure, do you have an agent running on a remote machine ?
Can you have a direct TCP connection to the remote machine (the default port it will use is 10022)

  
  
Posted 3 years ago

AgitatedDove14 Yes thanks, it seems relevant. So, how to run without docker? We'd like to try it without docker first.

  
  
Posted 3 years ago

Hi JumpyDragonfly13
Let's assume we have two machines, one we call remote, one we call laptop (at least for this discussion)

On the Remote machine we need to run: (notice we must have docker preinstalled on the remote machine, it can work without docker, let me know if this is the case for you)
clearml-agent daemon --queue interactive --create-queue --docker
On the Laptop we run
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04What clearml-session will do is create a "Task" and enqueue it on the "interactive" queue.
Then the Agent (on the remote machine) will take the "Task" spin the docker create JupyerLab & VSCode-server inside the docker and return links for us to connect to the Remote machine (notice the links are http://localhost because they are automatically tunneld over the SSH connection the clearml-session created for us in the background)
Make sense?

  
  
Posted 3 years ago

Hi JumpyDragonfly13

I don't know why I'm getting 

172.17.0.2

I think it (the remote jupyter Task) fails to get the correct IP address of the server.
You can manually correct it by going to the DevOps project, look for the runnig Task there, then under Configuration/Properties change external_address to the actual IP 10.19.20.15
Once that is done, re-run the clearml-session , it will suggest to connect to the running session, it should work....

BTW:
I'd like to see if we can fix this issue, and it will be helpful to try to reproduce the server setup
clearml-session version?
Ubuntu version ?
What's the docker you specified for the interactive session?
What's the command-line you are using to spin the agent ?

  
  
Posted 3 years ago

Hi AgitatedDove14 , I am trying to run clearml-session on my laptop, but it seems to keep running at "Waiting for environment setup to complete [usually about 20-30 seconds]" for several minutes. How could I debug and resolve it?

I do not see any error in https://app.community.clear.ml/projects/368fb3c4fcdd419e8b597ed100c29d69/experiments/bf78f1c303c74062986384cd74f0e542/info-output/log?columns=selected&columns=type&columns=name&columns=status&columns=project.name&columns=users&columns=started&columns=last_update&columns=last_iteration&columns=active_duration&order=last_update , and I can access the Jupyter, but I did not see access information of VSCode server and SSH server. Not sure what the issue is?

  
  
Posted 3 years ago

Hi AgitatedTurtle16
You can find documentation here:
https://github.com/allegroai/clearml-session
Basically it uses the cleaml-agents to launch a session on one of the machines in the cluster.
In the remote session itself it install jupyterlab + vscode-server, then it connects to the remote session (running on the agent's machine) automatically over ssh and creates tunnel to these services.

  
  
Posted 3 years ago

Hi JumpyDragonfly13

  1. is "10.19.20.15" accessible from your machine (i.e. can you ping to it)?
  2. Can you manually SSH to 10.19.20.15 on port 10022 ?
  
  
Posted 3 years ago

Okay i will try it, thank you very much

  
  
Posted 3 years ago

Sure thing 🙂

  
  
Posted 3 years ago

Basically run the 'agentin virtual environment mode JumpyDragonfly13 try this one (notice no --docker flag) clearml-agent daemon --queue interactive --create-queue Then from the "laptop" try to get a remote session with: clearml-session `

  
  
Posted 3 years ago
744 Views
17 Answers
3 years ago
one year ago
Tags
Similar posts