hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weihua JIANG <weihua.ji...@gmail.com>
Subject Re: Hive+HBase performance is much poorer than Hive+HDFS
Date Thu, 13 Oct 2011 08:53:25 GMT
After set this argument to 1000, I get a result: hive/hbase is 4X
slower than hive/hdfs.

how much X is the expected slowdown for hive/hbase vs hive/hdfs?

Thanks
Weihua

2011/10/12 Akash Ashok <thehellmaker@gmail.com>:
> Hi,
> To set this parameter you could use "set hbase.client.scanner.caching=500;"
> before the execution of your hive query.
>
> Cheers,
> Akash
>
> On Wed, Oct 12, 2011 at 8:34 AM, Weihua JIANG <weihua.jiang@gmail.com>wrote:
>
>> Since I am using Hive to perform query, I don't know how to set it.
>> Can you tell me how to do so?
>>
>> Thanks
>> Weihua
>>
>> 2011/10/12 Jean-Daniel Cryans <jdcryans@apache.org>:
>> > This is one big factor and you didn't mention configuring it:
>> > http://hbase.apache.org/book.html#perf.hbase.client.caching
>> >
>> > J-D
>> >
>> > On Tue, Oct 11, 2011 at 7:47 PM, Weihua JIANG <weihua.jiang@gmail.com
>> >wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have made some perf test about Hive+HBase. The table is a normal 2D
>> >> table with about 160M rows (each row with 7 small columns) and 32
>> >> regions. There is only one column family and all regions have been
>> >> major compacted to one store file before test.
>> >>
>> >> On a cluster with 11 task trackers (each with 4 map slots and 1 reduce
>> >> slot, these servers also act as region servers), a simple SQL in Hive
>> >>   select count(*) from table where column3='Y';
>> >> needs ~1700 seconds to finish.
>> >>
>> >> But, after use CTAS statement to create an internal table (stored as
>> >> sequence file), this statement only needs 43 seconds to finish.
>> >>
>> >> So Hive+HBase is 40X slower than Hive+HDFS.
>> >>
>> >> Though Hive+HBase has less map tasks (32 vs 223), but since there are
>> >> only 44 map slots available, I don't think it is the main cause.
>> >>
>> >> I studied the source code of HBase scan implementation. To me, it
>> >> seems, in my case, the scan performs HFile read in a quite similar way
>> >> as sequence file read (sequential reading of each key/value pair). So,
>> >> in theory, the performance shall be quite similar.
>> >>
>> >> Can anyone explain the 40X slowdown?
>> >>
>> >> Thanks
>> >> Weihua
>> >>
>> >
>>
>

Mime
View raw message