hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ric Wang <wqt.w...@gmail.com>
Subject Re: scanner on a given column: whole table scan or just the rows that have values
Date Wed, 10 Jun 2009 06:22:00 GMT
Billy,

Thank you, it's clearer to me now. But WITHIN the one family where the
column-label that needs to be scanned over lives (since I only have one
family for the entire table), it will still have to scan EVERY row in that
family no matter if each cell on that column-label has value or not?

-Ric


On Wed, Jun 10, 2009 at 1:03 AM, Billy Pearson
<sales@pearsonwholesale.com>wrote:

> It will not scan every row if there is more then one column family only the
> rows that have data for that column.
>
> You do have parallelism when scanning large tables the mr job should be
> splitting the job in to one mapper per region
> if coded setup correctly. New patches in dev set for 0.20 will allow more
> mappers per region speeding up this in some cases.
>
> Row-based database can have index but they do not scale well index require
> more memory
> Hbase is designed to be Distributed parallel fault tolerant that scales
> easy from 1 to hundreds to thousands of servers
>
> Billy
>
>
>
> "Ric Wang" <wqt.work@gmail.com> wrote in message
> news:21224f560906092144o703e9292o1587a74cceae2a3@mail.gmail.com...
>
>  Hi,
>>
>> Thanks. But if it is still scanning EVERY row in the entire table, how
>> does
>> HBase achieve better scan performance, compared to a row-based database?
>>
>> Thanks,
>> Ric
>>
>>
>>
>> On Tue, Jun 9, 2009 at 9:35 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>
>>  Without the use of indexes, there is no easy way to get the info without
>>> touching every row.
>>>
>>> So yes you'll be scanning every row.  But hbase has good bulk scan perf.
>>>
>>> On Jun 9, 2009 7:24 PM, "Ric Wang" <wqt.work@gmail.com> wrote:
>>>
>>> How does the scanner know how to get ONLY the "relevant" rows, without a
>>> whole table scan?
>>>
>>> Thanks!
>>> Ric
>>>
>>> On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula <naveenk@gmail.com>
>>> wrote:
>>> > The scanner only s...
>>> --
>>>
>>> Ric Wang wqt.work@gmail.com
>>>
>>>
>>
>>
>> --
>> Ric Wang
>> wqt.work@gmail.com
>>
>>
>
>


-- 
Ric Wang
wqt.work@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message