hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: scanner on a given column: whole table scan or just the rows that have values
Date Wed, 10 Jun 2009 06:03:08 GMT
It will not scan every row if there is more then one column family only the 
rows that have data for that column.

You do have parallelism when scanning large tables the mr job should be 
splitting the job in to one mapper per region
if coded setup correctly. New patches in dev set for 0.20 will allow more 
mappers per region speeding up this in some cases.

Row-based database can have index but they do not scale well index require 
more memory
Hbase is designed to be Distributed parallel fault tolerant that scales easy 
from 1 to hundreds to thousands of servers

Billy



"Ric Wang" <wqt.work@gmail.com> wrote in 
message news:21224f560906092144o703e9292o1587a74cceae2a3@mail.gmail.com...
> Hi,
>
> Thanks. But if it is still scanning EVERY row in the entire table, how 
> does
> HBase achieve better scan performance, compared to a row-based database?
>
> Thanks,
> Ric
>
>
>
> On Tue, Jun 9, 2009 at 9:35 PM, Ryan Rawson 
> <ryanobjc@gmail.com> wrote:
>
>> Without the use of indexes, there is no easy way to get the info without
>> touching every row.
>>
>> So yes you'll be scanning every row.  But hbase has good bulk scan perf.
>>
>> On Jun 9, 2009 7:24 PM, "Ric Wang" 
>> <wqt.work@gmail.com> wrote:
>>
>> How does the scanner know how to get ONLY the "relevant" rows, without a
>> whole table scan?
>>
>> Thanks!
>> Ric
>>
>> On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula 
>> <naveenk@gmail.com>
>> wrote:
>> > The scanner only s...
>> --
>>
>> Ric Wang wqt.work@gmail.com
>>
>
>
>
> -- 
> Ric Wang
> wqt.work@gmail.com
> 



Mime
View raw message