hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tarnas <...@email.com>
Subject Re: HFiles created by MR Jobs and HBase Performance
Date Tue, 17 May 2011 14:14:37 GMT
If I understand hbase bulk loading correctly each hfile generated needs its
keys to fit within one existing region - that is the reason the total order
partitioner is used. I believe however that within one region before
compaction you can have multiple hfiles for a given column family and each
hfile does not need to have distinct key ranges, they just need to fit
within the overall range of the region. This does impact read performance so
multiple hfiles get cleaned up and condensed into one during a compaction.

-chris

2011/5/17 Panayotis Antonopoulos <antonopoulospan@hotmail.com>

>
>
>
>
> Hello,
> I am writing a MR job where each reducer will output one HFile containing
> some of the rows of the table that will be created.
> At first I thought to use the HashPartitioner to achieve load balancing,
> but this would mix the rows and the output of each reducer will not be a
> continuous part of the Hbase table that will be created combining all these
> files.
>
> So, I would like to ask you if it is important to use a Partitioner
> (TotalOrderPartitioner, for example) that will allow the reducers to have a
> continuous part of the table?
>
> If I do not do that, will this ruin the performance of HBase when executing
> queries or when it runs compactions, as rows, which are supposed to be next
> to each other, will be in different HFiles and the number of disk seeks will
> increase?
>
> Thank you for your help!
> Panagiotis
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message