hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mat Hofschen <hofsc...@gmail.com>
Subject Import into empty table
Date Wed, 11 Mar 2009 09:14:57 GMT
Hi all,
I am having trouble with importing a medium dataset into an empty new table.
The import runs for about 60 minutes.
There is a lot of failed/killed tasks in this scenario and sometime the
import fails altogether.

If I import a smaller subset into the empty table and then perform manual
split of regions (via split button on webpage) and then import the larger
dataset, the import runs for about 10 minutes.

It seems to me that the performance bottleneck during the first import is
the single region on the single cluster machine. This machine is heavily
loaded. So my question is whether I can force hbase to split faster during
heavy write operations and what tuning parameters may be affecting this
scenario.

Thanks for your help,
Matthias

p.s. here are the details

Details:
33 cluster machines in testlab (3 year old servers with hyperthreading
single core cpu) 1.5 gig of memory, debian 5 lenny 32bit
hadoop 0.19.0, hbase 0.19.0
-Xmx 500mb for java processes
hadoop
mapred.map.tasks=20
mapred.reduce.tasks=15
dfs.block.size=16777216
mapred.tasktracker.map.tasks.maximum=4
mapred.tasktracker.reduce.tasks.maximum=4

hbase
hbase.hregion.max.filesize=67108864

hbase table
3 column families

import file
5 Mill records with 18 columns with 6 columns per family
filesize 1.1 gig csv-file
import via provided java SampleUploader

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message