From user-return-48680-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Mon Jun 1 06:45:56 2015 Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 297971724D for ; Mon, 1 Jun 2015 06:45:56 +0000 (UTC) Received: (qmail 36484 invoked by uid 500); 1 Jun 2015 06:45:54 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 36405 invoked by uid 500); 1 Jun 2015 06:45:54 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 36393 invoked by uid 99); 1 Jun 2015 06:45:54 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 06:45:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A40E5C0DFB for ; Mon, 1 Jun 2015 06:45:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 89hR9tVF9neo for ; Mon, 1 Jun 2015 06:45:43 +0000 (UTC) Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id A08EC20FEE for ; Mon, 1 Jun 2015 06:45:42 +0000 (UTC) Received: by lbbuc2 with SMTP id uc2so78238346lbb.2 for ; Sun, 31 May 2015 23:45:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=qbB5moxCEb+N+js8bQUm7jMHYqufGBCK78qqCEwdtNw=; b=QFiUW7BIKw7ao9cK2vZ4dgxay45xBvpYfToR2jNcliIqIVXC6ZyU6IZ3Sz/RHOvpgX DODwPX4T8I7n2IR1JGJqcHMrenLLR+RTMfdvwvCirO6I0FhlKVHh9AOdyix+yCYnh+l5 Pn6ra6u8/jamTnHy7ZNiofkkihUbdxr3gIMR3Bz03jeNlk5ZL56frjef4Ogz3JC9LMCK qhvNm+0X6GQxwDKyi2HVi5miIB5on7p98LmjluUh4yHxle2sPijD1McSm3jQEEmozL5D /IzY4hOzOlaOqBgHFgdgmpuRRtkq8PgwAnX3OYYyLzOV6gNFvi+I4a0oG0YU0JBa09x4 DNaQ== MIME-Version: 1.0 X-Received: by 10.112.122.98 with SMTP id lr2mr19608974lbb.68.1433141142145; Sun, 31 May 2015 23:45:42 -0700 (PDT) Received: by 10.112.241.71 with HTTP; Sun, 31 May 2015 23:45:42 -0700 (PDT) In-Reply-To: References: Date: Mon, 1 Jun 2015 12:15:42 +0530 Message-ID: Subject: Re: How to scan only Memstore from end point co-processor From: Anoop John To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7bf0d9fab0d0ce05176f2c07 --047d7bf0d9fab0d0ce05176f2c07 Content-Type: text/plain; charset=UTF-8 If your scan is having a time range specified in it, HBase internally will check this against the time range of files etc and will avoid those which are clearly out of your interested time range. You dont have to do any thing for this. Make sure you set the TimeRange for ur read -Anoop- On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan < ramkrishna.s.vasudevan@gmail.com> wrote: > We have a postScannerOpen hook in the CP but that may not give you a direct > access to know which one are the internal scanners on the Memstore and > which one are on the store files. But this is possible but we may need to > add some new hooks at this place where we explicitly add the internal > scanners required for a scan. > > But still a general question - are you sure that your data will be only in > the memstore and that the latest data would not have been flushed by that > time from your memstore to the Hfiles. I see that your scenario is write > centric and how can you guarentee your data can be in memstore only? > Though your time range may say it is the latest data (may be 10 to 15 min) > but you should be able to configure your memstore flushing in such a way > that there are no flushes happening for the latest data in that 10 to 15 > min time. Just saying my thoughts here. > > > > > On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah > wrote: > > > Hi all, > > > > Here is our use case, > > > > We have a very write heavy cluster. Also we run periodic end point co > > processor based jobs that operate on the data written in the last 10-15 > > mins, every 10 minute. > > > > Is there a way to only query in the MemStore from the end point > > co-processor? The periodic job scans for data using a time range. We > would > > like to implement a simple logic, > > > > a. if query time range is within MemStore's TimeRangeTracker, then query > > only memstore. > > b. If end Time of the query time range is within MemStore's > > TimeRangeTracker, but query start Time is outside MemStore's > > TimeRangeTracker (memstore flush happened), then query both MemStore and > > Files. > > c. If start time and end time of the query is outside of MemStore > > TimeRangeTracker we query only files. > > > > The incoming data is time series and we do not allow old data (out of > sync > > from clock) to come into the system(HBase). > > > > Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan, > > that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is > > this available in Trunk? > > > > Also, how do I access the Memstore for a Column Family in the end point > > co-processor from CoprocessorEnvironment? > > > --047d7bf0d9fab0d0ce05176f2c07--