hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slava Gorelik" <slava.gore...@gmail.com>
Subject Re: Hbase / Hadoop Tuning
Date Thu, 02 Oct 2008 21:33:48 GMT
Thank You.I'll try to implement all your advices.

Thanks Again and Best Regards.


On Fri, Oct 3, 2008 at 12:27 AM, Jonathan Gray <jlist@streamy.com> wrote:

> If this is the case, then certainly what is hurting you is (repeating what
> has been said before, but maybe it's clearer to you now):
>
> - Serialized round-trip RPC calls for each insert (will eventually be
> handled with batched updates and/or parallelism in client, for now, you
> would need to have multiple processes doing the writing... you will see a
> major improvement if you have multiple writing processes)
>
> - Inserting to a single region.  As described before, you're only hitting a
> single server, so your writes are not at all being distributed.  Lower your
> region/filesize to get splits sooner.  Also, keep your eye on:
> https://issues.apache.org/jira/browse/HBASE-902  This feature is intended
> for situations like this.
>
>
> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, October 02, 2008 1:55 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase / Hadoop Tuning
>
> Hi.My webapp is trying to simulate the row by row operation, it means that
> it's adding in the loop 100K Rows.
> And my time measurement is started a line before loop and finished a line
> after the loop, it means that no overhead of webapp.
> But, sure, i'll take in deep, that i'm not spending the 1 or 2  ms for
> some operation.
>
> Thank You and Best Regards.
>
>
>
>
> On Thu, Oct 2, 2008 at 11:36 PM, Jonathan Gray <jlist@streamy.com> wrote:
>
> > In this case, it would definitely hurt your performance.
> >
> > One question.  Have you done more detailed timings to determine where
> time
> > is spent?  With the overhead of your webapp, and it streaming insertions
> > one
> > row at a time, is it possible that a significant amount of time is being
> > spent before or after the hbase commit (significant in this case could be
> > 1-2 ms/row).
> >
> > JG
> >
> > -----Original Message-----
> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > Sent: Thursday, October 02, 2008 1:12 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Hbase / Hadoop Tuning
> >
> > Thank You.
> > According to doing write in MR jobs, the problem is that rows are coming
> to
> > webapp one by one and i can't accumulate them into
> > one big batch update, it means i need to run MR job for each single row,
> in
> > this case will MR jobs help ?
> >
> > Best Regards.
> >
> > On Thu, Oct 2, 2008 at 10:58 PM, Jim Kellerman (POWERSET) <
> > Jim.Kellerman@microsoft.com> wrote:
> >
> > > Responses inline below.
> > > > -----Original Message-----
> > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > Sent: Thursday, October 02, 2008 12:39 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Hbase / Hadoop Tuning
> > > >
> > > > Thank You Jim for a quick answer.
> > > > 1) If i understand correct, using 2 clients should allow me improve
> > > > the performance twice (more or less) ?
> > >
> > > I don't know if you will get 2x performance, but it will be greater
> than
> > > 1x.
> > >
> > > > 2) Currently, our webapp is HBase client using Htable - is that what
> > you
> > > > meant, when you said "(HBase, not web) clients" ?
> > >
> > > If multiple requests come into your webapp, and your webapp is
> > > multithreaded, you will not see a performance increase.
> > >
> > > If your webapp runs a different process for each request, you will see
> > > a performance increase because the RPC connection will not be shared
> > > and consequently will not block on the giant lock. That is why I
> > > recommended splitting up your job using Map/Reduce.
> > >
> > > > 3) 64MB for single region server is a minimum size or could be less ?
> > >
> > > It could be less, but that is the default block size for the Hadoop
> DFS.
> > > If you make it smaller, you might want to change the default block size
> > > for Hadoop as well.
> > >
> > > > 4) When is planed to fix the RPC lock for concurrent operations
> > > > in single client ?
> > >
> > > This change is targeted for somewhere in the next 6 months according
> > > to the roadmap.
> > >
> > >
> > > > Thank You Again and Best Regards.
> > > >
> > > >
> > > > On Thu, Oct 2, 2008 at 10:30 PM, Jim Kellerman (POWERSET) <
> > > > Jim.Kellerman@microsoft.com> wrote:
> > > >
> > > > > What you are storing is 140,000,000 bytes, so having multiple
> > > > > region servers will not help you as a single region is only
> > > > > served by a single region server. By default, regions split
> > > > > when they reach 256MB. So until the region splits, all traffic
> > > > > will go to a single region server. You might try reducing the
> > > > > maximum file size to encourage region splitting by changing the
> > > > > value of hbase.hregion.max.filesize to 64MB.
> > > > >
> > > > > Using a single client will also limit write performance.
> > > > > Even if the client is multi-threaded, there is a big giant lock
> > > > > in the RPC mechanism which prevents concurrent requests (This
> > > > > is something we plan to fix in the future).
> > > > >
> > > > > Multiple clients do not block against one another the way multi-
> > > > > threaded clients do currently. So another way to increase
> > > > > write performance would be to run multiple (HBase, not web)
> clients,
> > > > > by either running multiple processes directly, or by utilizing
> > > > > a Map/Reduce job to do the writes.
> > > > >
> > > > > ---
> > > > > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > > > > Sent: Thursday, October 02, 2008 12:07 PM
> > > > > > To: hbase-user@hadoop.apache.org
> > > > > > Subject: Re: Hbase / Hadoop Tuning
> > > > > >
> > > > > > Hi.Thank you for quick response.
> > > > > > We are using 7 machines (6 RedHat 5 and 1 is SuSe interprise
10).
> > > > > > Each machine is : 4 CPU with 4gb ram and 200gb HD, connected
with
> > 1gb
> > > > > > network interface.
> > > > > > All machines in the same rec. On one machine (master) we are
> > running
> > > > > > Tomcat
> > > > > > with one webapp
> > > > > > that is adding 100000 rows. Nothing else is running. When no
> webapp
> > > > > > running
> > > > > > the CPU load is less the 1%.
> > > > > >
> > > > > > We are using Hbase 0.18.0 and Hadoop 0.18.0.
> > > > > > Hbase cluster is one master and 6 region servers.
> > > > > >
> > > > > > Row addition is done by BatchUpdate and commint into single
> column
> > > > > family.
> > > > > > The data is simple bytes array (1400 bytes each row).
> > > > > >
> > > > > >
> > > > > > Thank You and Best Regards.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 2, 2008 at 9:39 PM, stack <stack@duboce.net>
wrote:
> > > > > >
> > > > > > > Tell us more Slava.  HBase versions and how many regions
you
> have
> > > in
> > > > > > your
> > > > > > > cluster?
> > > > > > >
> > > > > > > If small rows, your best boost will likely come when we
support
> > > > > batching
> > > > > > of
> > > > > > > updates: HBASE-748.
> > > > > > >
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Slava Gorelik wrote:
> > > > > > >
> > > > > > >> Hi All.
> > > > > > >> Our environment - 8 Datanodes (1 is also Namenode),
> > > > > > >> 7 from them is also region servers and 1 is Master,
default
> > > > > replication
> > > > > > -
> > > > > > >> 3.
> > > > > > >> We have application that heavy writes with relative
small rows
> -
> > > > about
> > > > > > >> 10Kb,
> > > > > > >> current performance is 100000 rows in 580000 Milisec
- 5.8
> > Milisec
> > > > /
> > > > > > row.
> > > > > > >> Is there any way to improve this performance by some
tuning /
> > > > tweaking
> > > > > > >> HBase
> > > > > > >> or Hadoop ?
> > > > > > >>
> > > > > > >> Thank You and Best Regards.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message