hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HBase on same boxes as HDFS Data nodes
Date Wed, 07 Jul 2010 17:15:44 GMT
Hey Jamie,

Using the deprecated classes should be fine - many people use them with
success.

The xceivers thing is certainly worth checking.

The other thing to check is GC tuning. Have you changed heap size or
anything in the hbase configuration, or just left it at defaults?

-Todd

On Wed, Jul 7, 2010 at 10:12 AM, Jamie Cockrill <jamie.cockrill@gmail.com>wrote:

> One last thing, a slight oddity of our setup is that although we're on
> Hadoop 0.20.2, we were previously on 0.18.something and upgraded. That
> went fine and there have been no problems, however some convenience
> base-classes that we created for our jobs were based on the old
> pre-0.20 API, as such there are deprecation warnings all over. I am
> being consistence and using the mapred.TableOutputFormat (complete
> with deprecation), but just in case that's causing an issue, I thought
> I'd throw it in...
>
> I might try make a version that uses only classes in the 0.20 API.
>
> Thanks,
>
> Jamie
>
> On 7 July 2010 18:08, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
> > Hi Todd & JD,
> >
> > Environment:
> > All (hadoop and HBase) installed as of karmic-cdh3, which means:
> > Hadoop 0.20.2+228
> > HBase 0.89.20100621+17
> > Zookeeper 3.3.1+7
> >
> > Unfortunately my whole cluster of regionservers have now crashed, so I
> > can't really say if it was swapping too much. There is a DEBUG
> > statement just before it crashes saying:
> >
> > org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer in
> > hdfs://<somewhere on my HDFS, in /hbase>
> >
> > What follows is:
> >
> > WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> > on <file location as above> File does not exist. Holder
> > DFSClient_-11113603 does not have any open files
> >
> > It then seems to try and do some error recovery (Error Recovery for
> > block null bad datanode[0] nodes == null), fails (Could not get block
> > locations. Source file "<hbase file as before>" - Aborting). There is
> > then an ERROR org.apache...HRegionServer: Close and delete failed.
> > There is then a similar LeaseExpiredException as above.
> >
> > There are then a couple of messages from HRegionServer saying that
> > it's notifying master of its shutdown and stopping itself. The
> > shutdown hook then fires and the RemoteException and
> > LeaseExpiredExceptions are printed again.
> >
> > ulimit is set to 65000 (it's in the regionserver log, printed as I
> > restarted the regionserver), however I haven't got the xceivers set
> > anywhere. I'll give that a go. It does seem very odd as I did have a
> > few of them fall over one at a time with a few early loads, but that
> > seemed to be because the regions weren't splitting properly, so all
> > the traffic was going to one node and it was being overwhelmed. Once I
> > throttled it, after one load it a region split seemed to get
> > triggered, which flung regions all over, which made subsequent loads
> > much more distributed. However, perhaps the time-bomb was ticking...
> > I'll  have a go at specifying the xcievers property. I'm pretty
> > certain i've got everything else covered, except the patches as
> > referenced in the JIRA.
> >
> > I just grepped some of the log files and didn't get an explicit
> > exception with 'xciever' in it.
> >
> > I am considering downgrading(?) to 0.20.5, however because everything
> > is installed as per karmic-cdh3, I'm a bit reluctant to do so as
> > presumably Cloudera has tested each of these versions against each
> > other? And I don't really want to introduce further versioning issues.
> >
> > Thanks,
> >
> > Jamie
> >
> >
> > On 7 July 2010 17:30, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> >> Jamie,
> >>
> >> Does your configuration meets the requirements?
> >>
> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
> >>
> >> ulimit and xcievers, if not set, are usually time bombs that blow off
> when
> >> the cluster is under load.
> >>
> >> J-D
> >>
> >> On Wed, Jul 7, 2010 at 9:11 AM, Jamie Cockrill <
> jamie.cockrill@gmail.com>wrote:
> >>
> >>> Dear all,
> >>>
> >>> My current HBase/Hadoop architecture has HBase region servers on the
> >>> same physical boxes as the HDFS data-nodes. I'm getting an awful lot
> >>> of region server crashes. The last thing that happens appears to be a
> >>> DroppedSnapshot Exception, caused by an IOException: could not
> >>> complete write to file <file on HDFS>. I am running it under load,
how
> >>> heavy that is I'm not sure how that is quantified, but I'm guessing it
> >>> is a load issue.
> >>>
> >>> Is it common practice to put region servers on data-nodes? Is it
> >>> common to see region server crashes when either the HDFS or region
> >>> server (or both) is under heavy load? I'm guessing that is the case as
> >>> I've seen a few similar posts. I've not got a great deal of capacity
> >>> to be separating region servers from HDFS data nodes, but it might be
> >>> an argument I could make.
> >>>
> >>> Thanks
> >>>
> >>> Jamie
> >>>
> >>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message