hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: How to scan only Memstore from end point co-processor
Date Mon, 01 Jun 2015 06:39:49 GMT
We have a postScannerOpen hook in the CP but that may not give you a direct
access to know which one are the internal scanners on the Memstore and
which one are on the store files. But this is possible but we may need to
add some new hooks at this place where we explicitly add the internal
scanners required for a scan.

But still a general question - are you sure that your data will be only in
the memstore and that the latest data would not have been flushed by that
time from your memstore to the Hfiles.  I see that your scenario is write
centric and how can you guarentee your data can be in memstore only?
Though your time range may say it is the latest data (may be 10 to 15 min)
but you should be able to configure your memstore flushing in such a way
that there are no flushes happening for the latest data in that 10 to 15
min time.  Just saying my thoughts here.

On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah <gborah@appdynamics.com>

> Hi all,
> Here is our use case,
> We have a very write heavy cluster. Also we run periodic end point co
> processor based jobs that operate on the data written in the last 10-15
> mins, every 10 minute.
> Is there a way to only query in the MemStore from the end point
> co-processor? The periodic job scans for data using a time range. We would
> like to implement a simple logic,
> a. if query time range is within MemStore's TimeRangeTracker, then query
> only memstore.
> b. If end Time of the query time range is within MemStore's
> TimeRangeTracker, but query start Time is outside MemStore's
> TimeRangeTracker (memstore flush happened), then query both MemStore and
> Files.
> c. If start time and end time of the query is outside of MemStore
> TimeRangeTracker we query only files.
> The incoming data is time series and we do not allow old data (out of sync
> from clock) to come into the system(HBase).
> Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan,
> that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is
> this available in Trunk?
> Also, how do I access the Memstore for a Column Family in the end point
> co-processor from CoprocessorEnvironment?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message