Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Hello, I tried the clearml-session CLI to start a jupyter instance on an agent, but an error with the password, here is the full CLI log:
` $ clearml-session --git-credentials true --queue aws --vscode-server false --user-folder "~/projects/1" --requirements reqs.txt --username h4
clearml-session - CLI for launching JupyterLab / VSCode on a remote machine
Verifying credentials
Use previous queue (resource) 'aws' [Y]/n? Y

Interactive session config:
{
"base_task_id": null,
"git_credentials": true,
"jupyter_lab": true,
"packages": [
"git+ ",
],
"password": "c3427219763d5c01f75fd49c8dc5dc822f3fa565315ef2fb649c8dc5dc55f3e82675e8803",
"project": "1",
"queue": "aws",
"user_folder": "~/projects/1",
"username": "h4",
"vscode_server": false
}

Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
New session created [id=c4d36b9df6d5f26a4e07b926d5aceb39]
Waiting for remote machine allocation [id=c4d36b9df6d5f26a4e07b926d5aceb39]
.Status [queued]
..Status [in_progress]
Remote machine allocated
Setting remote environment [Task id=c4d36b9df6d5f26a4e07b926d5aceb39]
Setup process details:
Waiting for environment setup to complete [usually about 20-30 seconds]
............
Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel
Warning: Permanently added '[10.x.x.x]:10022' (ECDSA) to the list of known hosts.
Password: c3427219763d5c01f75fd49c8dc5dc822f3fa565315ef2fb649c8dc5dc55f3e82675e8803

Password: Error: incorrect password
Please enter password manually: `What is this password?

  
  
Posted 3 years ago
Votes Newest

Answers 30


not much info šŸ˜•

Can you manually run the docker ?

  
  
Posted 3 years ago

So I installed docker, added user to group allowed to run docker (not to have to run with sudo, otherwise it fails), then ran these two commands and it worked

  
  
Posted 3 years ago

or the ClearML UI?

  
  
Posted 3 years ago

That didnā€™t gave useful infos, was that docker was not installed in the agent machine x)

JitteryCoyote63 you mean "docker" was not installed and it did not throw an error ?

  
  
Posted 3 years ago

Alright I have a followup question then: I used the param --user-folder ā€œ~/projects/my-projectā€, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?

Yes you must make sure the docker can mount a persistent folder for you to work on.
Let me check what's the easiest way to do that

  
  
Posted 3 years ago

Yes docker was not installed in the machine

Okay make sense, we should definitely check that you have docker before starting the daemon šŸ˜‰

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

It might be misleading if you are running on k8s cluster, where one cannot just -v mount volume...
What do you think?

  
  
Posted 3 years ago

šŸ‘

So nvidia-container-toolkit and systemctl restart dockerd fixed it?

  
  
Posted 3 years ago

Let's assume the host has a folder for all users for persistence storage, for example '/mnt/user_data/and you have a user named 'myuser' and a matching subfolder '/mnt/user_data/myuser
Then we can do:
clearml-session ... --docker "my_docker_image -v /mnt/user_data/:/host_mount/" --user-folder "/host_mount/myuser"BTW: The next time you call clearml-session these will become the default parameters, so no need to change anything šŸ™‚

  
  
Posted 3 years ago

Ā you mean ā€œdockerā€ was not installed and it did not throw an error ?

Yes docker was not installed in the machine

Yes you must make sure the docker can mount a persistent folder for you to work on.

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

  
  
Posted 3 years ago

So that I donā€™t loose what I worked on when stopping the session, and if I need to, I can ssh to the machine and directly access the content inside the user folder

  
  
Posted 3 years ago

Can you try upgrade to the latest? pip install clearml-agent==0.17.2 ?

  
  
Posted 3 years ago

Alright I have a followup question then: I used the param --user-folder ā€œ~/projects/my-projectā€, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?

  
  
Posted 3 years ago

Awesome! Thanks! šŸ™

  
  
Posted 3 years ago

Here are the logs of the agent :)
` (base) user@worker:~$ tail -f /tmp/.clearml_agent_daemon_outjdups8t2.txt
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false

+----------------------------------+--------+-------+
| id | name | tags |
+----------------------------------+--------+-------+
| 54e4a62a402d5135612ba7b12cfe4e57 | docker | |
+----------------------------------+--------+-------+

Starting infinite task polling loop...
tail: /tmp/.clearml_agent_daemon_outjdups8t2.txt: file truncated
task 44abb7247b0da367b6da05e65592cafb pulled from 54e4a62a402d5135612ba7b12cfe4e57 by worker office:worker-0:docker
Running task '44abb7247b0da367b6da05e65592cafb'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.qds4cx.txt', '/tmp/.clearml_agent_out.qds4cx.txt'
Running Task 44abb7247b0da367b6da05e65592cafb inside docker: nvidia/cuda:10.1-runtime-ubuntu18.04 --network host
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '--network', 'host', '-e', 'CLEARML_WORKER_ID=office:worker-0:docker', '-e', 'CLEARML_DOCKER_IMAGE=nvidia/cuda:10.1-runtime-ubuntu18.04 --network host', '-v', '/home/user/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.3sh28jd6.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.gowxn7rp:/root/.ssh', '-v', '/home/user/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/user/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/user/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/user/.clearml/cache:/clearml_agent_cache', '-v', '/home/user/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvidia/cuda:10.1-runtime-ubuntu18.04', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip==20.2.3" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 44abb7247b0da367b6da05e65592cafb']
Running Docker:
Executing: ('docker', 'run', '-t', '--gpus', '"device=0"', '--network', 'host', '-e', 'CLEARML_WORKER_ID=office:worker-0:docker', '-e', 'CLEARML_DOCKER_IMAGE=nvidia/cuda:10.1-runtime-ubuntu18.04 --network host', '-v', '/home/user/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.3sh28jd6.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.gowxn7rp:/root/.ssh', '-v', '/home/user/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/user/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/user/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/user/.clearml/cache:/clearml_agent_cache', '-v', '/home/user/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvidia/cuda:10.1-runtime-ubuntu18.04', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip==20.2.3" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 44abb7247b0da367b6da05e65592cafb')

DONE: Running task '44abb7247b0da367b6da05e65592cafb', exit status -1 `

  
  
Posted 3 years ago

Yes, it works now! Yay!

  
  
Posted 3 years ago

Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '--network', 'host', '-e', 'CLEARML_WORKER_ID=office:worker-0:docker', '-e', 'CLEARML_DOCKER_IMAGE=nvidia/cuda:10.1-runtime-ubuntu18.04 --network host', '-v', '/home/user/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.toc3_yks.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.1dsz4bz8:/root/.ssh', '-v', '/home/user/.clearml/apt-cache.2:/var/cache/apt/archives', '-v', '/home/user/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/user/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/user/.clearml/cache:/clearml_agent_cache', '-v', '/home/user/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvidia/cuda:10.1-runtime-ubuntu18.04', 'bash', '-c', 'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; apt-get update ; apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0 ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip==20.2.3" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 25c4287a041503d09f2922264133b889']

  
  
Posted 3 years ago

can you check the agentā€™s logs? maybe we can find something there

  
  
Posted 3 years ago

from the ClearML UI

  
  
Posted 3 years ago

But I see in the agent logs:
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...

  
  
Posted 3 years ago

I got some progress TimelyPenguin76 , Now the task runs and I get the error from docker:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

  
  
Posted 3 years ago

this is the last line, same a before

  
  
Posted 3 years ago

this is from the agent?

  
  
Posted 3 years ago

When you start the ClearML agent, the last line is the file for the agentā€™s output, for linux is should be something like:

Running CLEARML-AGENT daemon in background mode, writing stdout/stderr to /tmp/.clearml_agent_daemon_***.txt

  
  
Posted 3 years ago

is there a command / file for that?

  
  
Posted 3 years ago

which ClearML agent version are you running?

  
  
Posted 3 years ago

0.17.1

  
  
Posted 3 years ago

I followed https://github.com/NVIDIA/nvidia-docker/issues/1034#issuecomment-520282450 and now it seems to be setting up properly

  
  
Posted 3 years ago

Alright, how can I then mount a volume of the disk?

  
  
Posted 3 years ago

the first problem I had, that didnā€™t gave useful infos, was that docker was not installed in the agent machine x)

  
  
Posted 3 years ago
1K Views
30 Answers
3 years ago
one year ago
Tags