whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Savu <savu.and...@gmail.com>
Subject Re: non-deterministic "Could not get lock /var/lib/dpkg/lock"
Date Fri, 03 Feb 2012 18:24:12 GMT
Good catch Karel! I have tried to investigate this in the past but I have
never considered that it may be a race condition with a cron job (most of
the synchronisation tests we've added are designed to prove that this is
not a condition triggered by Whirr).

What if we stop the crond service while running the install/configure
scripts?
http://www.cyberciti.biz/faq/howto-linux-unix-start-restart-cron/


> In my opinion, as much of the installation/configuration steps should
> be done using a config management tool (puppet/chef).
>

Totally agree + we have the needed infrastructure for this.


> Once the configuration is published to each node you can trigger
> puppet/chef it as much as you like, and eventually you should reach a
> good state. Running the complete whirr-generated script(s) multiple
> times is going to be slower and much more error prone.
>

+ it's hard to make retry-friendly bash scripts.


>
> Regards,
> Karel
>
> On Mon, Oct 3, 2011 at 10:22 PM, Paul Baclace <paul.baclace@gmail.com>
> wrote:
> > Two runs of whirr on EC2 yesterday randomly failed to install Hadoop
> > components.  First it occurred on the master node, but when it occurred
> in
> > one slave and not another, I could find the diff of the /tmp/logs/ from
> > jclouds.  In a third run, everything worked fine.  Same scripts driving
> > whirr, same AMI, same number of nodes, same region, etc. Snippets of
> > /tmp/logs/stderr.log shown below indicate that apt-get update had "Could
> not
> > get lock /var/lib/dpkg/lock" on one slave, but not another.
> >
> > This is a serious reliability issue.  What is non-deterministic here?
> >
> > Paul
> >
> > ------------ slave 1 -------------------
> > + register_cloudera_repo
> > + which dpkg
> > + cat
> > + curl -s http://archive.cloudera.com/debian/archive.key
> > + sudo apt-key add -
> > + sudo apt-get update
> > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily
> > unavailable)
> > E: Unable to lock the administration directory (/var/lib/dpkg/), is
> another
> > process using it?
> > + which dpkg
> > + apt-get update
> > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily
> > unavailable)
> > E: Unable to lock the administration directory (/var/lib/dpkg/), is
> another
> > process using it?
> > + apt-get -y install hadoop-0.20
> >
> > -------------- slave 2 ---------------
> > + register_cloudera_repo
> > + which dpkg
> > + cat
> > + curl -s http://archive.cloudera.com/debian/archive.key
> > + sudo apt-key add -
> > + sudo apt-get update
> > + which dpkg
> > + apt-get update
> > + apt-get -y install hadoop-0.20
> > dpkg-preconfigure: unable to re-open stdin:
> > + cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.dist
> > + update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf
> > /etc/hadoop-0.20/conf.dist 90
> > + install_cdh_hbase -c aws-ec2 -u
> > http://apache.cs.utah.edu/hbase/hbase-0.90.3/hbase-0.90.3.tar.gz
> >
> > -------------
>
>
>
> --
> Karel Vervaeke
> http://outerthought.org/
> Open Source Content Applications
> Makers of Kauri, Daisy CMS and Lily
>

Mime
View raw message