hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Borah <gbo...@appdynamics.com>
Subject Re: How to scan only Memstore from end point co-processor
Date Tue, 02 Jun 2015 02:35:01 GMT
Thanks Vladimir. We will try this out soon.

Regards,
Gautam

On Mon, Jun 1, 2015 at 12:22 AM, Vladimir Rodionov <vladrodionov@gmail.com>
wrote:

> InternalScan has ctor from Scan object
>
> See https://issues.apache.org/jira/browse/HBASE-12720
>
> You can instantiate InternalScan from Scan, set checkOnlyMemStore, then
> open RegionScanner, but the best approach is
> to cache data on write and run regular RegionScanner from memstore and
> block cache.
>
> best,
> -Vlad
>
>
>
>
> On Sun, May 31, 2015 at 11:45 PM, Anoop John <anoop.hbase@gmail.com>
> wrote:
>
> > If your scan is having a time range specified in it, HBase internally
> will
> > check this against the time range of files etc and will avoid those which
> > are clearly out of your interested time range.  You dont have to do any
> > thing for this.  Make sure you set the TimeRange for ur read
> >
> > -Anoop-
> >
> > On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > We have a postScannerOpen hook in the CP but that may not give you a
> > direct
> > > access to know which one are the internal scanners on the Memstore and
> > > which one are on the store files. But this is possible but we may need
> to
> > > add some new hooks at this place where we explicitly add the internal
> > > scanners required for a scan.
> > >
> > > But still a general question - are you sure that your data will be only
> > in
> > > the memstore and that the latest data would not have been flushed by
> that
> > > time from your memstore to the Hfiles.  I see that your scenario is
> write
> > > centric and how can you guarentee your data can be in memstore only?
> > > Though your time range may say it is the latest data (may be 10 to 15
> > min)
> > > but you should be able to configure your memstore flushing in such a
> way
> > > that there are no flushes happening for the latest data in that 10 to
> 15
> > > min time.  Just saying my thoughts here.
> > >
> > >
> > >
> > >
> > > On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah <gborah@appdynamics.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Here is our use case,
> > > >
> > > > We have a very write heavy cluster. Also we run periodic end point co
> > > > processor based jobs that operate on the data written in the last
> 10-15
> > > > mins, every 10 minute.
> > > >
> > > > Is there a way to only query in the MemStore from the end point
> > > > co-processor? The periodic job scans for data using a time range. We
> > > would
> > > > like to implement a simple logic,
> > > >
> > > > a. if query time range is within MemStore's TimeRangeTracker, then
> > query
> > > > only memstore.
> > > > b. If end Time of the query time range is within MemStore's
> > > > TimeRangeTracker, but query start Time is outside MemStore's
> > > > TimeRangeTracker (memstore flush happened), then query both MemStore
> > and
> > > > Files.
> > > > c. If start time and end time of the query is outside of MemStore
> > > > TimeRangeTracker we query only files.
> > > >
> > > > The incoming data is time series and we do not allow old data (out of
> > > sync
> > > > from clock) to come into the system(HBase).
> > > >
> > > > Cloudera has a scanner
> > org.apache.hadoop.hbase.regionserver.InternalScan,
> > > > that has methods like checkOnlyMemStore() and checkOnlyStoreFiles().
> Is
> > > > this available in Trunk?
> > > >
> > > > Also, how do I access the Memstore for a Column Family in the end
> point
> > > > co-processor from CoprocessorEnvironment?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message