hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: RegionServer shutdown with ScanWildcardColumnTracker exception
Date Wed, 17 Apr 2013 18:34:35 GMT
No. It happened in our production environment after running counters
increments every 5 minutes for a few weeks now. I could try to reproduce in
test cluster environment but that would mean running for weeks as well...
but I will keep digging and let you guys know if it happens again or / and
I have more information or insights on the issue.

Thanks.


On Wed, Apr 17, 2013 at 8:18 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Is there any testcases that tries to reproduce your issue?
>
> Regards
> Ram
>
>
> On Wed, Apr 17, 2013 at 9:47 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > There is a hint mechanism available when scanning happens.  But i dont
> > think there should be much of difference between a scan that happens
> during
> > flush and the normal scan.
> >
> > Will look thro the code and come back on this.
> >
> > Regards
> > RAm
> >
> >
> > On Wed, Apr 17, 2013 at 9:40 PM, Amit Sela <amits@infolinks.com> wrote:
> >
> >> No, no encoding.
> >>
> >>
> >> On Wed, Apr 17, 2013 at 6:56 PM, ramkrishna vasudevan <
> >> ramkrishna.s.vasudevan@gmail.com> wrote:
> >>
> >> > @Lars
> >> > You have any suggestions on this?
> >> >
> >> > @Amit
> >> > You have any Encoder enabled like the Prefix Encoding stuff?
> >> > There was one optimization added recently but that is not in 0.94.2
> >> >
> >> > Regards
> >> > Ram
> >> >
> >> >
> >> > On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <amits@infolinks.com>
> wrote:
> >> >
> >> > > I scanned over this counter with and without column specification
> and
> >> all
> >> > > looks OK now.
> >> > > I have no CPs in this table.
> >> > > Is there some kind of a hint mechanism in HBase' internal scan ?
> >> because
> >> > > it's weird that ScanWildcardColumnTracker.checkColumn says that
> >> column is
> >> > > smaller than previous column: *imprersions_ALL_2013041617*. there
> is
> >> no
> >> > > imprersions only impressions and r is indeed smaller than s, could
> it
> >> be
> >> > > some kind of hint bug ? I don't think I know enough of HBase
> >> internals to
> >> > > fully understand that...
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
> >> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >
> >> > > > Hi Amit
> >> > > >
> >> > > > Checking the code this is possible when the qualifiers are not
> >> sorted.
> >> > >  Do
> >> > > > you have any CPs in your path which tries to play with the KVs?
> >> > > >
> >> > > > Seems to be a very weird thing.
> >> > > > Can you try doing a scan on the KV just before this happens.
 That
> >> will
> >> > > tel
> >> > > > you the existing kvs that are present.
> >> > > >
> >> > > > Even now if you can have the cluster you can try scanning for
the
> >> > region
> >> > > > for which the flush happened.  That will give us some more info.
> >> > > >
> >> > > > Regards
> >> > > > Ram
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <amits@infolinks.com>
> >> > wrote:
> >> > > >
> >> > > > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
> >> > > > >
> >> > > > > I have three families in this table: weekly, daily, hourly.
each
> >> > family
> >> > > > has
> >> > > > > the following qualifiers:
> >> > > > > Weekly - impressions_{countrycode}_{week#} - country code
is 0,
> 1
> >> or
> >> > > ALL
> >> > > > > (aggregation of both 0 and 1)
> >> > > > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> >> > > > > respectively.
> >> > > > >
> >> > > > > Just before the exception the regionserver StoreFile executes
> the
> >> > > > > following:
> >> > > > >
> >> > > > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher]
INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family
> >> Bloom
> >> > > > filter
> >> > > > > type for hdfs://
> >> > > > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> >> > > > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> >> > > > > CompoundBloomFilterWriter
> >> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher]
INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General
Bloom
> >> and
> >> > NO
> >> > > > > DeleteFamily was added to HFile
> >> > (hdfs://hbase-master-address:8000/hbase
> >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> >> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> >> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher]
INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
> >> > > > sequenceid=210517246,
> >> > > > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> >> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> >> > > > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher]
INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family
> >> Bloom
> >> > > > filter
> >> > > > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> >> > > > >
> *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> >> > > > > CompoundBloomFilterWriter
> >> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher]
INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General
Bloom
> >> and
> >> > NO
> >> > > > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> >> > > > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> >> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher]
FATAL
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
> >> region
> >> > > > server
> >> > > > > region-server-address,8041,1364993168088: Replay of HLog
> required
> >> > > > > . Forcing server shutdown
> >> > > > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> >> > > > > af2760e4d04a9e3025d1fb53bdba8acf*.
> >> > > > > ....
> >> > > > > ....
> >> > > > > ...
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> >> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > > >
> >> > > > > > Seems interesting.  Can  you tell us what are the families
and
> >> the
> >> > > > > > qualifiers available in your schema.
> >> > > > > >
> >> > > > > > Any other interesting logs that you can see before
this?
> >> > > > > >
> >> > > > > > BTW the version of HBase is also needed?  If we can
track it
> >> out we
> >> > > can
> >> > > > > > then file a JIRA if it is a bug.
> >> > > > > >
> >> > > > > > Regards
> >> > > > > > RAm
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <
> amits@infolinks.com
> >> >
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > Hi all,
> >> > > > > > >
> >> > > > > > > I had a regionserver crushed during counters increment.
> >> Looking
> >> > at
> >> > > > the
> >> > > > > > > regionserver log I saw:
> >> > > > > > >
> >> > > > > > > org.apache.hadoop.hbase.DroppedSnapshotException:
region:
> >> > > TABLE_NAME,
> >> > > > > > > ROW_KEY...at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> >> > > > > > >         at java.lang.Thread.run(Thread.java:722)
> >> > > > > > > Caused by: java.io.IOException:
> >> > > ScanWildcardColumnTracker.checkColumn
> >> > > > > ran
> >> > > > > > > into a column actually smaller than the previous
column:
> >> > > *QUALIFIER*
> >> > > > > > > at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> >> > > > > > >         at
> >> > > > > > >
> >> > >
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> >> > > > > > >         at
> >> > > > > > >
> >> > >
> org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> >> > > > > > >
> >> > > > > > > The strange thing is that the *QUALIFER* name
as it appears
> in
> >> > the
> >> > > > log
> >> > > > > is
> >> > > > > > > misspelled.... there is no, and never was such
qualifier
> name.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Amit.
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message