Now it starts, I’ll see if this solves the issue
the deep learning AMI from nvidia (Ubuntu 18.04)
the instances takes so much time to start, like 5 mins
so what worked for me was the following startup userscript:#!/bin/bash sleep 120 while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done sudo apt-get update while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done sudo apt-get install -y python3-dev python3-pip gcc git build-essential python3 -m pip install -U pip ...
As you can see, more hard waiting (initial sleep), and then before each apt action, make sure there is no lock
I think waiting for the apt locks to be released with something like this would workstartup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...
Weirdly this throws an error in the autoscaler:Spinning new instance type=v100_spot Error: Failed to start new instance, unexpected '{' in field name
edited the aws_auto_scaler.py, actually I think it’s just a typo, I just need to double the brackets
AMI ami-08e9a0e4210f38cb6
, region: eu-west-1a
there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent won’t start because the userdata script fails
How did you add it? Just edited the configuration part of the task or with the wizard?
Hi JitteryCoyote63 , which ec2 type and AMI are you using?
The running task in the UI for it
can you attach the full log of the instance? Did the aws scalar output any logs?
agree
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
Another process is using the lock, can you specify the ami (and region) so I can try to reproduce it?