Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Getting Credential Errors When Attempting To Pip Install Transformers From Git Repo, On A Gpu Queue.

Hello! Getting credential errors when attempting to pip install transformers from git repo, on a GPU Queue. fatal: unable to write credential store: Device or resource busy

I've tried searching the slack and Googling for this error with respect to ClearML, but I can't seem to find anything.
Notably, it does seem to be able to clone different versions of my own repo, but fails on pip/transformers specifically. Confirmed this by pushing a change to the repo and specifying that commit in the ClearML task.

Full error:
Replacing original pip vcs 'git+ ' with 'git+ ' Collecting transformers Cloning :****@github.com/huggingface/transformers (to revision 61c506349134db0a0a2fd6fb2eff8e29a2f84e79) to /tmp/pip-install-9isrbw8m/transformers Running command git clone -q ' ` :****@github.com/huggingface/transformers' /tmp/pip-install-9isrbw8m/transformers
fatal: unable to write credential store: Device or resource busy

1624890299001 Gandalf:gpu0 DEBUG Running command git checkout -q 61c506349134db0a0a2fd6fb2eff8e29a2f84e79
Installing build dependencies ... [?25l- \ | / done
[?25h Getting requirements to build wheel ... [?25l- error
[31m ERROR: Command errored out with exit status 1:
command: /root/.clearml/venvs-builds/3.6/bin/python /root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpi3muv854
cwd: /tmp/pip-install-9isrbw8m/transformers
Complete output (10 lines):
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
main()
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 108, in get_requires_for_build_wheel
backend = _build_backend()
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 99, in _build_backend
obj = getattr(obj, path_part)
AttributeError: module 'setuptools.build_meta' has no attribute 'legacy'
----------------------------------------[0m
[31mERROR: Command errored out with exit status 1: /root/.clearml/venvs-builds/3.6/bin/python /root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpi3muv854 Check the logs for full command output.[0m
[?25hRequirementsManager handler <clearml_agent.helper.package.external_req.ExternalRequirements object at 0x7f2d9e039668> raised exception: Failed installing GIT/HTTPs package 'git+ '

clearml_agent: ERROR: Could not install task requirements!
Failed installing GIT/HTTPs package 'git+ ' Code used: from clearml import Task
task = Task.init(project_name="IDX Phonemes", task_name="Test_Queue")
task.execute_remotely(queue_name="idx_gandalf_titan-rtx")
import torch
print(f"Cuda status: {torch.cuda.is_available()}")
import transformers
print(transformers) Task "Installed Packages" (I edited the task manually in ClearML) boto3
datasets
clearml
tokenizers
transformers @ git+
torch `

  
  
Posted 3 years ago
Votes Newest

Answers 30


My typos are killing us, apologies :
change -t to -it it will make it interactive (i.e. you can use bash 🙂 )

  
  
Posted 3 years ago

Hi SmallDeer34
I need some help what is the difference between the manual one and the automatic one ?
from your previous log, this is the bash command executed inside the container, can you try to "step by step" try to catch who/what is messing it up ?
docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip -v /home/dwhitena/.clearml/pip-download-cache:/root/.clearml/pip-download-cache -v /home/dwhitena/.clearml/cache:/clearml_agent_cache -v /home/dwhitena/.clearml/vcs-cache:/root/.clearml/vcs-cache --rm nvidia/cuda:11.4.0-devel-ubuntu18.04 bash -c echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 1e876021bbef49a291d66ac9a2270705
Maybe as a first start I would try:
docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip -v /home/dwhitena/.clearml/pip-download-cache:/root/.clearml/pip-download-cache -v /home/dwhitena/.clearml/cache:/clearml_agent_cache -v /home/dwhitena/.clearml/vcs-cache:/root/.clearml/vcs-cache --rm nvidia/cuda:11.4.0-devel-ubuntu18.04 bash

  
  
Posted 3 years ago

AgitatedDove14 Ok, ran your last docker run now I'm interactive in the container. What's next?

  
  
Posted 3 years ago

Yes please, just to verify my hunch.
I think that somehow the docker mounts the agent is creating are (for some reason) messing it up.
Basically you can just run the following (it will do everything automatically) (replace the <TASK_ID_HERE> with the actual one)
docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip -v /home/dwhitena/.clearml/pip-download-cache:/root/.clearml/pip-download-cache -v /home/dwhitena/.clearml/cache:/clearml_agent_cache -v /home/dwhitena/.clearml/vcs-cache:/root/.clearml/vcs-cache --rm nvidia/cuda:11.4.0-devel-ubuntu18.04 bash -c echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id <TASK_UD_HERE>

  
  
Posted 3 years ago

Dang, I just closed the docker session. Should I open it again and try your command again?

  
  
Posted 3 years ago

So what is the difference?!

  
  
Posted 3 years ago

Yes my bad 😞
Let's try again:
docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip -v /home/dwhitena/.clearml/pip-download-cache:/root/.clearml/pip-download-cache -v /home/dwhitena/.clearml/cache:/clearml_agent_cache -v /home/dwhitena/.clearml/vcs-cache:/root/.clearml/vcs-cache --rm nvidia/cuda:11.4.0-devel-ubuntu18.04 bashThen inside the docker:
echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id 1e876021bbef49a291d66ac9a2270705

  
  
Posted 3 years ago

Ok, all good there AgitatedDove14

  
  
Posted 3 years ago

Should I try anything else?

  
  
Posted 3 years ago

Nooooooooooooooooooooooo

  
  
Posted 3 years ago

it's one where I reset it, and cleared out the Installed Packages to only have transformers @ git+ https://github.com/huggingface/transformers@61c506349134db0a0a2fd6fb2eff8e29a2f84e79 in it.

  
  
Posted 3 years ago

Looks like a few permission issues?

  
  
Posted 3 years ago

IrritableOwl63 pm'd you a task ID

  
  
Posted 3 years ago

Okay now let's try the final lines:
$LOCAL_PYTHON -m virtualenv /root/venv /root/venv/bin/python3 -m pip install git+

  
  
Posted 3 years ago

1e876021bbef49a291d66ac9a2270705 just make sure you reset it 🙂

  
  
Posted 3 years ago

Ohh, yes, we need to map the correct clearml.conf, sorry, try (I fixed both clearml.conf mapping and ,ssh folder mapping):
docker run -t --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /home/dwhitena/clearml.conf:/root/clearml.conf -v /home/dwhitena/.ssh:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip -v /home/dwhitena/.clearml/pip-download-cache:/root/.clearml/pip-download-cache -v /home/dwhitena/.clearml/cache:/clearml_agent_cache -v /home/dwhitena/.clearml/vcs-cache:/root/.clearml/vcs-cache --rm nvidia/cuda:11.4.0-devel-ubuntu18.04 bash

  
  
Posted 3 years ago

in the UI, find the task (just search for the Task ID, it will find it), then tight click it, and select "reset"

  
  
Posted 3 years ago

What am I resetting? Just to confirm?

  
  
Posted 3 years ago

SmallDeer34 can you get me a TASK ID of one of the jobs that failed for you?

  
  
Posted 3 years ago

Running now AgitatedDove14 . No worries.

  
  
Posted 3 years ago

Hmmm...
clearml_agent: ERROR: Could not find task id=1e876021bbef49a291d66ac9a2270705 (for host: )

  
  
Posted 3 years ago

I think that what happened was you are running it on the host machine (not inside the docker)
I probably missed a " somewhere

  
  
Posted 3 years ago

Hmmmm.... when I run that command AgitatedDove14 I can't seem to do anything in the resulting shell. It just hangs on any command (including things like ls etc.)

  
  
Posted 3 years ago

Wait IrritableOwl63 this looks like ti worked, am I right ? huggingface was correctly installed

  
  
Posted 3 years ago

Here is the whole session AgitatedDove14

  
  
Posted 3 years ago

Let's try:
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent

  
  
Posted 3 years ago

Can you send the console output of this entire session please ?

  
  
Posted 3 years ago

Successfully installed...

  
  
Posted 3 years ago

Also in the same open docker session, can you try:
$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>Where the Task ID is one of the failed executions (only reset it before)

  
  
Posted 3 years ago

But from the log it seems that:
you are not running as root in the docker? Python3.8 is installed (and not python 3.6 as before)

  
  
Posted 3 years ago