hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhtar Muhammad Din <akhtar.m...@gmail.com>
Subject Re: Hbase Performance Issue
Date Thu, 09 Jan 2014 20:27:04 GMT
I am thankful to all for taking out time and suggesting me the solutions.
Recently, i have implemented a solution using the bulk load. It seems
little faster than the one using the API's, but it still takes far too long
than compared to HDFS. Processing and saving of 10 GB  data on HDFS takes
only 3 mins witn 10 nodes cluster whereas Hbase taking about 30 mins.

Bulk load has its own issue that i need to partition the table before hand,
otherwise it runs only one reducer.  I am working over a platform where I
can't anticipate how much data is going to be loaded in the table and its
difficult to pre-split the table. Is there any way that i can run multiple
reducers without pre-splitting the table?






On Wed, Jan 8, 2014 at 4:53 AM, Suraj Varma <svarma.ng@gmail.com> wrote:

> Akhtar:
> There is no manual step for bulk load. You essentially have your script
> that runs the map reduce job that creates the HFiles. On success of this
> script/command, you run the completebulkload command ... the whole bulk
> load can be automated, just like your map reduce job.
>
> --Suraj
>
>
> On Mon, Jan 6, 2014 at 11:42 AM, Mike Axiak <mike@axiak.net> wrote:
>
> > I suggest you look at hannibal [1] to look at the distribution of the
> data
> > on your cluster:
> >
> > 1: https://github.com/sentric/hannibal
> >
> >
> > On Mon, Jan 6, 2014 at 2:14 PM, Doug Meil <doug.meil@explorysmedical.com
> > >wrote:
> >
> > >
> > > In addition to what everybody else said, look what *where* the regions
> > are
> > > for the target table.  There may be 5 regions (for example), but look
> to
> > > see if they are all on the same RS.
> > >
> > >
> > >
> > >
> > >
> > > On 1/6/14 5:45 AM, "Nicolas Liochon" <nkeywal@gmail.com> wrote:
> > >
> > > >It's very strange that you don't see a perf improvement when you
> > increase
> > > >the number of nodes.
> > > >Nothing in what you've done change the performances at the end?
> > > >
> > > >You may want to check:
> > > > - the number of regions for this table. Are all the region server
> busy?
> > > >Do
> > > >you have some split on the table?
> > > > - How much data you actually write. Is the compression enabled on
> this
> > > >table?
> > > > - Do you have compactions? You may want to change the max store file
> > > >settings for unfrequent write load (see
> > > >http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html).
> > > >
> > > >It would be interesting to test as well the 0.96 release.
> > > >
> > > >
> > > >
> > > >On Sun, Jan 5, 2014 at 2:12 AM, Vladimir Rodionov
> > > ><vrodionov@carrieriq.com>wrote:
> > > >
> > > >>
> > > >> I think in this case, writing data to HDFS or HFile directly (for
> > > >> subsequent bulk loading)
> > > >> is the best option. HBase will never compete in write speed with
> HDFS.
> > > >>
> > > >> Best regards,
> > > >> Vladimir Rodionov
> > > >> Principal Platform Engineer
> > > >> Carrier IQ, www.carrieriq.com
> > > >> e-mail: vrodionov@carrieriq.com
> > > >>
> > > >> ________________________________________
> > > >> From: Ted Yu [yuzhihong@gmail.com]
> > > >> Sent: Saturday, January 04, 2014 2:33 PM
> > > >> To: user@hbase.apache.org
> > > >> Subject: Re: Hbase Performance Issue
> > > >>
> > > >> There're 8 items under:
> > > >> http://hbase.apache.org/book.html#perf.writing
> > > >>
> > > >> I guess you have through all of them :-)
> > > >>
> > > >>
> > > >> On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
> > > >> <akhtar.mdin@gmail.com>wrote:
> > > >>
> > > >> > Thanks guys for your precious time.
> > > >> > Vladimir, as Ted rightly said i want to improve write performance
> > > >> currently
> > > >> > (of course i want to read data as fast as possible later on)
> > > >> > Kevin, my current understanding of bulk load is that you generate
> > > >> > StoreFiles and later load through a command line program. I dont
> > want
> > > >>to
> > > >> do
> > > >> > any manual step. Our system is getting data after every 15
> minutes,
> > so
> > > >> > requirement is to automate it through client API completely.
> > > >> >
> > > >> >
> > > >>
> > > >> Confidentiality Notice:  The information contained in this message,
> > > >> including any attachments hereto, may be confidential and is
> intended
> > > >>to be
> > > >> read only by the individual or entity to whom this message is
> > > >>addressed. If
> > > >> the reader of this message is not the intended recipient or an agent
> > or
> > > >> designee of the intended recipient, please note that any review,
> use,
> > > >> disclosure or distribution of this message or its attachments, in
> any
> > > >>form,
> > > >> is strictly prohibited.  If you have received this message in error,
> > > >>please
> > > >> immediately notify the sender and/or Notifications@carrieriq.comand
> > > >> delete or destroy any copy of this message and its attachments.
> > > >>
> > >
> > >
> >
>



-- 
Regards
Akhtar Muhammad Din

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message