hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase/Stargate dataflow in I/O perspective
Date Tue, 01 Apr 2014 07:33:39 GMT
Don't expect any kind of performance when running HBase on VMs, by
definition. Although exactly how bad will depend on the VM host
environment, allocations per container, and such.

As for your questions:

> When is HFile(StoreFile) being loaded as a region into region server's
memory?

Stargate is just another HBase client from the perspective of the
RegionServers. So, the RegionServers will service requests from the REST
gateway as needed, reading store files on demand.

> Does a region stay in region server's memory afterward? When is it being
freed?

Depends, although the REST gateway pessimistically sets Scan#
scan.setCacheBlocks(false) so scans from REST are unlikely to result in
HFile block caching if those blocks are not in cache already. If they
happen to be in cache from another request from another type of client,
then the RegionServers will use the cached blocks and update usage counts
for those blocks for LRU, etc.

> When Stargate uses a scan instance to obtain data, does it communicate
with region server with another connection overhead?

It looks like this: REST client <--> Stargate <--> RegionServers




On Tue, Apr 1, 2014 at 9:18 AM, yglin <yglin.mlanser@gmail.com> wrote:

> Hi~
>
> I would like to know how data flows when you query it from HBase or
> Stargate, especially in I/O perspective.
> Please point me some directions to study.
> That means questions like below:
> When is HFile(StoreFile) being loaded as a region into region server's
> memory?
> Does a region stay in region server's memory afterward? When is it being
> freed?
> When Stargate uses a scan instance to obtain data, does it communicate with
> region server with another connection overhead?
>
> Actually I'm asking these because I'm experimenting Toad for Cloud Database
> on HBase.
> And I got a performance issue of querying 400K data rows in about 5
> minutes,
> kind of a awkward number.
> I installed HBase/HDFS on 7 VMs,
> 1 ResourceManager, 1 as NameNode and HMaster, 5 as DataNodes and
> RegionServers
> Barely change any configuration for performance tuning.
> I drew myself a very simple chart trying to find where are the bottlenecks.
> <
> http://apache-hbase.679495.n3.nabble.com/file/n4057719/Toad_Read_HBase_Process.png
> >
>
> I know I could miss many details in this simple chart
> Please give me some clues
> Much appreciate
>
> yglin
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-Stargate-dataflow-in-I-O-perspective-tp4057719.html
> Sent from the HBase User mailing list archive at Nabble.com.
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message