Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

Hi again, I am trying to make the aws autoscaler work with ec2 instances, but it fails to setup the agent in the machine: the logs of the user-data script show that it fails updating the machine (See below)
[ 142.425915] cloud-init[1949]: Reading package lists... [ 142.470754] cloud-init[1949]: E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable) [ 142.474848] cloud-init[1949]: E: Unable to lock directory /var/lib/apt/lists/ [ 142.520947] cloud-init[1949]: E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) [ 142.525393] cloud-init[1949]: E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? [ 142.569601] cloud-init[1949]: E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) [ 142.573702] cloud-init[1949]: E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? [ 142.617883] cloud-init[1949]: E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) [ 142.621645] cloud-init[1949]: E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? [ 142.666691] cloud-init[1949]: E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) [ 142.670841] cloud-init[1949]: E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? [ 142.716479] cloud-init[1949]: E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) [ 142.720421] cloud-init[1949]: E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? [ 142.853612] cloud-init[1949]: /usr/bin/python3: No module named pip [ 142.885863] cloud-init[1949]: /usr/bin/python3: No module named pip [ 142.911694] cloud-init[1949]: /usr/bin/python3: No module named virtualenv [ 142.915263] cloud-init[1949]: /var/lib/cloud/instance/scripts/part-001: line 12: clearml_agent_venv/bin/activate: No such file or directory [ 142.916332] cloud-init[1949]: /var/lib/cloud/instance/scripts/part-001: line 13: python: command not found [ 142.923711] cloud-init[1949]: % Total % Received % Xferd Average Speed Time Time Time Current [ 142.924663] cloud-init[1949]: Dload Upload Total Spent Left Speed [ 142.925851] cloud-init[1949]: 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 19 100 19 0 0 19000 0 --:--:-- --:--:-- --:--:-- 19000 [ 143.366592] cloud-init[1949]: E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) [ 143.371130] cloud-init[1949]: E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? [ 143.414392] cloud-init[1949]: /usr/bin/python3: No module named pip [ 143.447130] cloud-init[1949]: /usr/bin/python3: No module named pip [ 143.479183] cloud-init[1949]: /usr/bin/python3: No module named pip [ 143.509085] cloud-init[1949]: /usr/bin/python3: No module named pip [ 145.029505] cloud-init[1949]: fatal: $HOME not set [ 147.651857] cloud-init[1949]: /var/lib/cloud/instance/scripts/part-001: line 52: python: command not found

  
  
Posted 2 years ago
Votes Newest

Answers 19


Hi JitteryCoyote63 , which ec2 type and AMI are you using?

  
  
Posted 2 years ago

can you attach the full log of the instance? Did the aws scalar output any logs?

  
  
Posted 2 years ago

AMI ami-08e9a0e4210f38cb6 , region: eu-west-1a

  
  
Posted 2 years ago

edited the aws_auto_scaler.py, actually I think it’s just a typo, I just need to double the brackets

  
  
Posted 2 years ago

doesn’t really work unfortunately

  
  
Posted 2 years ago

5,6 mins exactly

  
  
Posted 2 years ago

Here is the log from the instance

  
  
Posted 2 years ago

The running task in the UI for it

  
  
Posted 2 years ago

How did you add it? Just edited the configuration part of the task or with the wizard?

  
  
Posted 2 years ago

so what worked for me was the following startup userscript:
#!/bin/bash sleep 120 while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done sudo apt-get update while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done sudo apt-get install -y python3-dev python3-pip gcc git build-essential python3 -m pip install -U pip ...

  
  
Posted 2 years ago

What do you mean by aws scalar?

  
  
Posted 2 years ago

Now it starts, I’ll see if this solves the issue

  
  
Posted 2 years ago

the instances takes so much time to start, like 5 mins

  
  
Posted 2 years ago

As you can see, more hard waiting (initial sleep), and then before each apt action, make sure there is no lock

  
  
Posted 2 years ago

agree

E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)Another process is using the lock, can you specify the ami (and region) so I can try to reproduce it?

  
  
Posted 2 years ago

the deep learning AMI from nvidia (Ubuntu 18.04)

  
  
Posted 2 years ago

on a p3.2xlarge instance

  
  
Posted 2 years ago

I think waiting for the apt locks to be released with something like this would work
startup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...Weirdly this throws an error in the autoscaler:
Spinning new instance type=v100_spot Error: Failed to start new instance, unexpected '{' in field name

  
  
Posted 2 years ago

there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent won’t start because the userdata script fails

  
  
Posted 2 years ago
665 Views
19 Answers
2 years ago
one year ago
Tags