hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: hbase performance
Date Fri, 02 Apr 2010 16:29:27 GMT
Chen,

In general, you're going to get significantly different performance on clusters of the size
you are testing with.  What is the disk setup?

Also, 2GB of ram is simply not enough to do any real testing.  I recommend a minimum of 2GB
of heap for each RegionServer alone, though I strongly encourage 4GB of heap to get good performance.
 You'll need at least 2GB additional for the DataNode and OS.

JG

> -----Original Message-----
> From: Juhani Connolly [mailto:juhani@ninja.co.jp]
> Sent: Friday, April 02, 2010 3:17 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: hbase performance
> 
> On 04/02/2010 06:09 PM, Chen Bangzhong wrote:
> > my switch is Dell 2724.
> >
> >
> I'm not a network admin, and I don't have the ability to know how
> congested your network is from that(nor do I think it is possible since
> there's going to be a lot of other factors).
> 
> Try running the test on a single machine using the miniCluster flag,
> this should eliminate network transfer as an issue. If despite the fact
> you're running everything on a single machine you get a high throughput
> on that your network is likely the issue. If on the other hand
> throughput goes down significantly the problem lies elsewhere.
> > --在 2010年4月2日 下午5:04,Chen Bangzhong <bangzhong@gmail.com>写道:
> >
> >
> >>
> >> 在 2010年4月2日 下午4:58,Juhani Connolly <juhani@ninja.co.jp>写道:
> >>
> >> You're results seem very low, but your system specs are also quite
> >>
> >>> moderate.
> >>>
> >>> On 04/02/2010 04:46 PM, Chen Bangzhong wrote:
> >>>
> >>>> Hi, All
> >>>>
> >>>> I am benchmarking hbase. My HDFS clusters includes 4 servers (Dell
> 860,
> >>>>
> >>> with
> >>>
> >>>> 2 GB RAM). One NameNode, one JobTracker, 2 DataNodes.
> >>>>
> >>>> My HBase Cluster also comprise 4 servers too. One Master, 2 region
> and
> >>>>
> >>> one
> >>>
> >>>> ZooKeeper. (Dell 860, with 2 GB RAM)
> >>>>
> >>>>
> >>> While I'm far from being an authority on the matter, running
> >>> datanodes+regionservers together should help performance
> >>> Try making your 2 datanodes + 2 regionservers into 4 servers
> running
> >>> both data/region.
> >>>
> >>>
> >> I will try to run datanode and region server on the same server.
> >>
> >>
> >>
> >>>> I runned the org.apache.hadoop.PerformanceEvaluation on the
> ZooKeeper
> >>>> server. the ROW_LENGTH was changed from 1000 to ROW_LENGTH =
> 100*1024;
> >>>> So each value will be 100k in size.
> >>>>
> >>>> hadoop version is 0.20.2, hbase version is 0.20.3. dfs.replication
> set
> >>>>
> >>> to 1.
> >>>
> >>>>
> >>> Setting replication to 1 isn't going to give results that are very
> >>> indicative of a "real" application, making it questionable as a
> >>> benchmark. If you intend to run on a single replica at release,
> you'll
> >>> be at high risk of data loss.
> >>>
> >>>
> >> Since I have only 2 data nodes, I set replication to 1. In
> production, it
> >> will be set to 3.
> >>
> >>
> >>
> >>>> The following is the command line:
> >>>>
> >>>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred
> >>>> --rows=10000 randomWrite 20.
> >>>>
> >>>> It tooks about one hour to complete the test(3468628 ms), about 60
> >>>>
> >>> writes
> >>>
> >>>> per second. It seems the performance is disappointing.
> >>>>
> >>>> Is there anything I can do to make hbase perform better under 100k
> size
> >>>>
> >>> ?I
> >>>
> >>>> didn't try the method mentioned in the performance wiki yet,
> because I
> >>>> thought 60writes/sec is too low.
> >>>>
> >>>>
> >>>>
> >>> Do you mean *over* 100k size?
> >>> 2GB ram is pretty low and you'd likely get significantly better
> >>> performance with it, though on this scale it probably isn't a
> >>> significant problem.
> >>>
> >>>
> >> the data size is exactly 100k size.
> >>
> >>
> >>
> >>>> If the value size is 1k, hbase performs much better. 200000
> >>>>
> >>> sequencewrite
> >>>
> >>>> tooks about 16 seconds, about 12500 writes/per second.
> >>>>
> >>>>
> >>>>
> >>> Comparing sequencewrite performance with randomwrite isn't a
> helpful
> >>> indicator. Do you have randomWrite results for 1k values? The way
> your
> >>> performance degrades with the size of the records seems like you
> may
> >>> have a bottleneck at network transfer? What's rack locality like
> and how
> >>> much bandwidth do you have between the servers?
> >>>
> >>>> Now I am trying to benchmark using two clients on 2 servers, no
> result
> >>>>
> >>> yet.
> >>>
> >>>>
> >>>>
> >>>
> >> for 1k datasize, the sequencewrite performance and randomWrite
> performance
> >> is about the same. All my servers are under one switch, don't know
> the
> >> switch bandwidth yet.
> >>
> >>
> >>
> >>>  You're already running 20 clients on your first server with the
> >>> PerformanceEvaluation. Do you mean you intend to run 20 on each?
> >>>
> >>>
> >> In fact, it is 20 threads on one machine.
> >>
> >>
> >>> Hopefully someone with better knowledge can give a better answer
> but my
> >>> guess is that you have a network transfer transfer. Try doing
> further
> >>> tests with randomWrite and decreasing value sizes and see if the
> time
> >>> correlates to the total amount of data written.
> >>>
> >>>
> >>>
> >>
> >


Mime
View raw message