hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From schausson <schaus...@softera.fr>
Subject Writting bottleneck in HBase ?
Date Wed, 23 Nov 2016 16:01:32 GMT

I am new to HBase and I'm facing performance issues ...

Short story : I want to persist 10000000 values in HBase and it takes same
time on a basic sandbox (HDP hadoop sandbox with single region server node)
as it takes on our "production" cluster (that comprises 12 region server
with higher capabilities than my developer's laptop ...)

Detailed case :

Basically, the use case is : My java application receives a binary file that
contains timeseries, decodes them and stores decoded data into a single
HBase table. 
HBase table design : we store one parameter per row, and we create one
column per timestamp to store associated value.  
My test case is based on an input file that spawns ~2000 rows/parameters
containing ~5000 values per row (=> around 10000000 values to store in my
HBase table in the end)

For this purpose, my application uses hbase client API :
Basically, my code proceeds as following : it decodes parameters timeseries
from input file and stores these values in a map<paramId, List&lt;value>>. 

When it reaches 10000 values (threshold that may be changed), it calls the
persistence method asynchronously and continue decoding operation till end
of the input file.
The persistence method proceeds like this (simplified code) :
/for (paramId : map.keys) {
	Put put = new Put(paramId);
	for (value : map.get(paramId)) {
		put.addColumn(family, columnName, value)
Choosing a threshold value of 10000 leads to ~1000 calls to persistence
method. Each call generates 2000 calls to table.put() method, each put
containing ~5 columns.

When I run this on HDP sandbox on my laptop (single region server), it
processes in less than 2 minutes
When I run this on our production cluster (12 region servers), it processes
in 2 minutes and sometimes more.

My question is : is the writting load distributed across all the region
servers ? obviously no... What should I do if I want my application to scale
properly when we add additional region servers ?

I don't know if I gave enough information, so please do not hesitate to ask
me more detail if needed, but any help would be greatly appreciated ...



View this message in context: http://apache-hbase.679495.n3.nabble.com/Writting-bottleneck-in-HBase-tp4084656.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message