hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: hadoop dfs.replication parameter and hbase/performance for random/scanner access
Date Tue, 05 Jan 2010 14:34:18 GMT
> I am not sure if hbase or hadoop is responsible for choosing the location of
> the replica. Having more replica may not avoid the disk access random read
> limitations but it should probably avoid network latency?
> If I have and web application with N clients accessing hbase, if one of
> those clients has to get the value for a  key it should be faster to access
> it if the value for that key is stored on that node? (as we avoid a network
> call). But you are right it does not seem I can get around the disk random
> read performance limitations.

The Namenode chooses the replicas location, always starting with the
local datanode is one exists. It will be faster for HBase to fetch a
block for a local Datanode, that is true. If the client is in the same
RegionServer that is on the same Datanode that has the block
containing your key, you will probably save some more trips but it's
not what you want to do (don't want client competing with DB).

> Unfortunately I am not in a position to really benchmark my application as I
> currently can't run it on a true cluster (using a cluster of virtual
> machines would lead to obviously wrong results ;). At this stage I am just
> trying to understand how hbase/hadoop works to avoid big mistakes in the
> design of the architecture. My application currently runs in production on a
> postgresql database: I replicate it over several nodes and read access
> performs better when I have more replicas because each node connects to a
> local database.

With HBase you have to consider that the region servers have regions
that have blocks that can be located on many Datanodes, most of the
time the local one. HBase doesn't serve the same data from more than 1
region server, instead it applies horizontal partitioning
automatically on your table.

> Thanks
> TuX

View raw message