Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

Answered

Hi there, I am running a clearml-agent in services mode (with docker) on a machine with two disks: one with the OS (8Go, 91% space used) and one for the data (100Go, 40% space used). When executing the auto-scaler task in this agent, I get the following error:
ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device clearml_agent: ERROR: Could not install task requirements! Command '['/root/.clearml/venvs-builds/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsooa09cvy.txt']' returned non-zero exit status 1. 2021-04-19 14:11:30 User aborted: stopping task (3)I don’t understand why there is no space left since I specify to use /data (where the 100Go disk is mounted) in the clearml.conf for the following locations:
(base) ubuntu@server:~$ cat clearml.conf | grep /data venvs_dir = /data/clearml_cache/venvs-builds path: /data/clearml_cache/vcs-cache path: /data/clearml_cache/pip-download-cache docker_pip_cache = /data/clearml_cache/pip-cache docker_apt_cache = /data/clearml_cache/apt-cache default_base_dir: "/data/clearml_cache/cache"Most likely I forgot something?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 17

JitteryCoyote63 it should just "freeze" after a while as it will constantly try to resend logs. Basically you should be fine 🙂
(If for some reason something crashed, please let me know so we can fix it)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

with the CLI, on a conda env located in /data

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

🤞

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

it will constantly try to resend logs

Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Will it freeze/crash/break/stop the ongoing experiments?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Alright, I will try now

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

YEY

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

JitteryCoyote63 how are you running the agent?

  				
Posted 
	3 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I was rather wondering why clearml was taking space while I configured it to use the /data volume. But as you described AgitatedDove14 it looks like an edge case, so I don’t mind 🙂

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

And the command is?

  				
Posted 
	3 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Worked like a charm 👌

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Maybe there is setting in docker to move the space used in a different location?

No that I know of...

I can simply increase the storage of the first disk, no problem with that

probably the easiest 🙂

But as you described

it looks like an edge case, so I don’t mind

🙂

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

17 Answers

3 years ago

2 years ago