hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ananth T. Sarathy" <ananth.t.sara...@gmail.com>
Subject Re: Waiting forever on scanner iterator
Date Wed, 21 Oct 2009 15:05:36 GMT
yeah,
 I just don't understand why getScanner("Column:X") returns the iterator and
process them yet getScanner("Column:Y") just spins and spins, yet  Column:Y
is a much denser result.

When I load from shell

*Version: 0.20.0, r810752, Thu Sep  3 00:06:18 PDT 2009
hbase(main):001:0> count 'GS_Applications'
09/10/21 11:04:48 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/root-region-server got 10.244.9.171:60020
09/10/21 11:04:48 DEBUG client.HConnectionManager$ClientZKWatcher: Got
ZooKeeper event, state: SyncConnected, type: None, path: null
09/10/21 11:04:48 DEBUG client.HConnectionManager$TableServers: Found ROOT
at 10.244.9.171:60020
09/10/21 11:04:48 DEBUG client.HConnectionManager$TableServers: Cached
location address: 10.245.82.160:60020, regioninfo: REGION => {NAME =>
'.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE =>
{{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384',
FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION =>
'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false',
BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION =>
'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false',
BLOCKCACHE => 'false'}]}}
09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Cached
location address: 10.242.71.191:60020, regioninfo: REGION => {NAME =>
'GS_Applications,,1255020109210', STARTKEY => '', ENDKEY => '', ENCODED =>
1732076772, TABLE => {{NAME => 'GS_Applications', FAMILIES => [{NAME =>
'Application', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'BinaryData', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'BinaryRetrieval', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}, {NAME => 'Files', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}, {NAME => 'Info', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}, {NAME => 'Network', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}, {NAME => 'Registry', VERSIONS => '1', COMPRESSION => 'NONE', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
09/10/21 11:04:49 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/root-region-server got 10.244.9.171:60020
09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Found ROOT
at 10.244.9.171:60020
09/10/21 11:04:49 DEBUG client.HConnectionManager$TableServers: Cached
location address: 10.245.82.160:60020, regioninfo: REGION => {NAME =>
'.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE =>
{{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384',
FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION =>
'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false',
BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION =>
'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false',
BLOCKCACHE => 'false'}]}}
09/10/21 11:04:50 DEBUG client.HTable$ClientScanner: Creating scanner over
GS_Applications starting at key ''
09/10/21 11:04:50 DEBUG client.HTable$ClientScanner: Advancing internal
scanner to startKey at ''
09/10/21 11:04:50 DEBUG client.HConnectionManager$TableServers: Cache hit
for row <> in tableName GS_Applications: location server 10.242.71.191:60020,
location region name GS_Applications,,1255020109210


*Ananth T Sarathy


On Wed, Oct 21, 2009 at 11:00 AM, stack <stack@duboce.net> wrote:

> In both cases you are doing a full table scan?
>
> Try from shell with DEBUG enable.  You'll see the regions being loaded.
>  May
> help you narrow in on problem region or at least on problem regionserver.
>
> St.Ack
>
> On Wed, Oct 21, 2009 at 7:19 AM, Ananth T. Sarathy <
> ananth.t.sarathy@gmail.com> wrote:
>
> > Anyone have any further thoughts on this?
> > Ananth T Sarathy
> >
> >
> > On Tue, Oct 20, 2009 at 6:37 PM, Ananth T. Sarathy <
> > ananth.t.sarathy@gmail.com> wrote:
> >
> > > Well that's not the case. Every Row has that column.  In fact the
> second
> > > snippet i sent  is with a column with many less rows. (1k vs 25k) but
> > comes
> > > back pretty quickly.
> > >
> > > By forever, I mean i have watched my logs do nothing for a half hour
> > before
> > > giving up.
> > >
> > >
> > > Ananth T Sarathy
> > >
> > >
> > >
> > > On Tue, Oct 20, 2009 at 5:03 PM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> > >
> > >> If you are asking for a column that is very sparse and doesnt exist,
> > >> it will cause HBase to read through the entire region to find 100
> > >> matching rows. This could take a while, you said 'forever', but could
> > >> you quantify that?
> > >>
> > >> On Tue, Oct 20, 2009 at 1:58 PM, Jean-Daniel Cryans <
> > jdcryans@apache.org>
> > >> wrote:
> > >> > Scanner pre-fetching is always faster, so something must be wrong
> with
> > >> > your region server. Check the logs, top, etc
> > >> >
> > >> > WRT to row size, it's pretty much a matter of how many bytes you
> have
> > >> > in each column and sum them up (plus some overhead with the keys).
> > >> >
> > >> > You want filters, check the filter package in the javadoc.
> > >> >
> > >> > J-D
> > >> >
> > >> > On Tue, Oct 20, 2009 at 1:52 PM, Ananth T. Sarathy
> > >> > <ananth.t.sarathy@gmail.com> wrote:
> > >> >> Ok, but how come
> > >> >> when I run a similiar call (with less returned rows 1000 vs 25k
in
> > the
> > >> >> previous one) it runs through the iterator very quickly?  (See
> Below)
> > >> >>
> > >> >> Also, how do I determine the row size? It's just text data, and
> > really
> > >> not
> > >> >> much.
> > >> >>
> > >> >> Finally, is there a way to query for rows that do not have a
> column?
> > >> (Ie all
> > >> >> rows without Files:path1)
> > >> >>
> > >> >>        HBaseTableDataManagerImpl htdmni = new
> > >> HBaseTableDataManagerImpl(
> > >> >>                "GS_Applications");
> > >> >>
> > >> >>        String[] columns = { "Files:path1" };
> > >> >>        log.info("Getting all Rows with Files");
> > >> >>        Scanner s = htdmni.getScannerForAllRows(columns);
> > >> >>        log.info("Got all Rows with Files");
> > >> >>
> > >> >>        Iterator<RowResult> iter = s.iterator();
> > >> >>        out
> > >> >>
> > >> >>
> > >>
> >
> .write("Application_Full_Name,Version,Application_installer_name,Operating
> > >> >> System, Application_platform
> > >> >>
> > >>
> >
> ,Application_sub_category,md5Hash,Sha1Hash,Sha256Hash,filepath,fileName,modified,size,operation\n");
> > >> >>        out.write("<BR>");
> > >> >>        while (iter.hasNext())
> > >> >>        {
> > >> >>
> > >> >> Ananth T Sarathy
> > >> >>
> > >> >>
> > >> >> On Tue, Oct 20, 2009 at 4:44 PM, Jean-Daniel Cryans <
> > >> jdcryans@apache.org>wrote:
> > >> >>
> > >> >>> If you have a very slow data source (S3), then it fetches
100 row
> > >> >>> before coming back to your client with all of them and that
can
> take
> > a
> > >> >>> lot of time. Also make sure that 100 of your rows can fit
in a
> > region
> > >> >>> server's memory. How big is each row?
> > >> >>>
> > >> >>> J-D
> > >> >>>
> > >> >>> On Tue, Oct 20, 2009 at 1:32 PM, Ananth T. Sarathy
> > >> >>> <ananth.t.sarathy@gmail.com> wrote:
> > >> >>> > I am running this code where
> > >> >>> >
> > >> >>> > getScannerForAllRows(columns) just does return
> > >> table.getScanner(columns);
> > >> >>> >
> > >> >>> > and the table   has setScannerCaching(100);
> > >> >>> >
> > >> >>> > But it spins forever after getting the iterator. Why
would that
> > be?
> > >> How
> > >> >>> can
> > >> >>> > I speed it up?
> > >> >>> >
> > >> >>> >        HBaseTableDataManagerImpl htdmni = new
> > >> HBaseTableDataManagerImpl(
> > >> >>> >                "GS_Applications");
> > >> >>> >
> > >> >>> >        String[] columns = { "Files:Name" };
> > >> >>> >        log.info("Getting all Rows with Files");
> > >> >>> >        Scanner s = htdmni.getScannerForAllRows(columns);
> > >> >>> >        log.info("Got all Rows with Files");
> > >> >>> >        log.info("Getting Iterator");
> > >> >>> >
> > >> >>> >        Iterator<RowResult> iter = s.iterator();
> > >> >>> >        log.info("Got Iterator");
> > >> >>> >
> > >> >>> >        while (iter.hasNext())
> > >> >>> >        {
> > >> >>> >            log.info("Getting next Row");
> > >> >>> >            RowResult rr = iter.next();
> > >> >>> >
> > >> >>> >
> > >> >>> > Ananth T Sarathy
> > >> >>> >
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message