hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ming.liu" <ming....@esgyn.cn>
Subject RE: Will Scan use blockcache?
Date Wed, 02 Jan 2019 01:46:28 GMT
Thank you Stack,
I will do more testing. The code you pointed out is very clear that get() is using scan().
I believe the performance difference is coming from the RPC. The get test using table.get(),
one call; The scan test call two APIs getScanner() then use next() method from the ResultScanner,
which I believe is another RPC.

I will test more. If necessary, I will file a JIRA.

thanks,
Ming

-----Original Message-----
From: Stack <stack@duboce.net> 
Sent: Tuesday, January 01, 2019 6:29 AM
To: Hbase-User <user@hbase.apache.org>
Subject: Re: Will Scan use blockcache?

On Sat, Dec 29, 2018 at 8:06 AM ming.liu <ming.liu@esgyn.cn> wrote:

> Thanks Stack,
>
> I have an impression that Get makes a Scan under the cover. But that
> cannot explain my observation of the performance difference between Get a
> single row vs. San a single row.
>
>
Here is how the Get gets converted into a Scan:
https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6920
Maybe try doing same in your experiment and if still a difference, flle an
issue and upload your test code. Explain how you ran your test (copy/paste
from here). branch-1.2 is old. I'd be interested in trying your test
against branch-2 to see if it has the issue you see.


> I assume the difference comes from the blockcache, Get() will first match
> the block cache, if it matches, the call finish and return back. But Scan
> will not match the block cache, it will go to memstore and then go to HFile
> if it is not in the memstore.
>
>
We first go to memstore, and if we have not satisfied the query, then go to
hfiles. Hfiles will fetch from blocks from blockcache if present else will
go to hdfs (and then populate cache). Should work this way whether Get or
Scan.

Thanks,
S


> My test program will do Get in a loop, for example, 1000 times of Get.
> Before the loop, I save the startime, and then after 1000 loops of Get,
> save the endtime. So (endtime - startime) / loop-count is the time spent in
> each Get operation.
> I have that same loop, replacing get() with scan(). The scan() will have
>  startRowKey = endRowkey, so it is just one row.
>
> I run the test program many times, using HBase 1.2.0. It shows the Scan is
> 2x slower than the get. So I want to understand the root cause. I assume
> get() will match the row in blockcache, so it will not go to the memstore
> or HFile. But scan() must go to HFile, because in my test, there is no put
> operation, just pure read. The row was inserted long time ago. So it should
> flush into HFile, and not in the memstore anymore. But I cannot
> confirm/verify this. So scan() have to send a request to HDFS to read from
> HFile, and it is slower than the get() operation.
>
> I can paste the test program if the description is still not clear.
>
> I may need to replace Scan with Get whenever possible, if there do have a
> performance difference. But if it is not true, I don't bother to modify
> this.
>
> thanks,
> Ming
>
> -----Original Message-----
> From: Stack <stack@duboce.net>
> Sent: Saturday, December 29, 2018 11:50 PM
> To: Hbase-User <user@hbase.apache.org>
> Subject: Re: Will Scan use blockcache?
>
> A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get
> both have to go to memstore since it will have latest versions of Cells.
>
> Say more about how you are doing the compare please.
>
> S
>
> On Sat, Dec 29, 2018 at 7:02 AM ming.liu <ming.liu@esgyn.cn> wrote:
>
> > Hi, all,
> >
> >
> >
> > I recently found that short scan is slower than get operation in HBase.
> It
> > is acceptable, but I really want to understand the reason.
> >
> >
> >
> > My testing table only has one row in it. So both Scan and Get just get
> one
> > row. Scan is still about 2x slower than get operation.
> >
> > So I want to understand the difference between get(rowkey) and
> Scan(rowkey,
> > rowkey).
> >
> >
> >
> > I think Get will first match in blockcache, if matched, it will go back
> > without accessing HFile/Memstore;
> >
> > Will Scan search in blockcache as well? Or it directly go to
> > memstore/HFile?
> >
> >
> >
> > thanks,
> >
> > Ming
> >
> >
> >
> >
>
>


Mime
View raw message