Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

Hi there, I am running a clearml-agent in services mode (with docker) on a machine with two disks: one with the OS (8Go, 91% space used) and one for the data (100Go, 40% space used). When executing the auto-scaler task in this agent, I get the following error:
ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device clearml_agent: ERROR: Could not install task requirements! Command '['/root/.clearml/venvs-builds/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsooa09cvy.txt']' returned non-zero exit status 1. 2021-04-19 14:11:30 User aborted: stopping task (3)I don’t understand why there is no space left since I specify to use /data (where the 100Go disk is mounted) in the clearml.conf for the following locations:
(base) ubuntu@server:~$ cat clearml.conf | grep /data venvs_dir = /data/clearml_cache/venvs-builds path: /data/clearml_cache/vcs-cache path: /data/clearml_cache/pip-download-cache docker_pip_cache = /data/clearml_cache/pip-cache docker_apt_cache = /data/clearml_cache/apt-cache default_base_dir: "/data/clearml_cache/cache"Most likely I forgot something?

  
  
Posted 3 years ago
Votes Newest

Answers 17


with the CLI, on a conda env located in /data

  
  
Posted 3 years ago

JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.

  
  
Posted 3 years ago

JitteryCoyote63 how are you running the agent?

  
  
Posted 3 years ago

it will constantly try to resend logs

Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue

  
  
Posted 3 years ago

Will it freeze/crash/break/stop the ongoing experiments?

  
  
Posted 3 years ago

Worked like a charm 👌

  
  
Posted 3 years ago

I was rather wondering why clearml was taking space while I configured it to use the /data volume. But as you described AgitatedDove14 it looks like an edge case, so I don’t mind 🙂

  
  
Posted 3 years ago

/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only

  
  
Posted 3 years ago

Maybe there is setting in docker to move the space used in a different location?

No that I know of...

I can simply increase the storage of the first disk, no problem with that

probably the easiest 🙂

But as you described 

 it looks like an edge case, so I don’t mind 

🙂

  
  
Posted 3 years ago

YEY

  
  
Posted 3 years ago

JitteryCoyote63 it should just "freeze" after a while as it will constantly try to resend logs. Basically you should be fine 🙂
(If for some reason something crashed, please let me know so we can fix it)

  
  
Posted 3 years ago

Alright, I will try now

  
  
Posted 3 years ago

And the command is?

  
  
Posted 3 years ago

Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that

  
  
Posted 3 years ago

I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)

  
  
Posted 3 years ago

🤞

  
  
Posted 3 years ago

AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)

  
  
Posted 3 years ago
1K Views
17 Answers
3 years ago
one year ago
Tags