hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioan Eugen Stan <stan.ieu...@gmail.com>
Subject Re: advice needed on storing large objects on hdfs
Date Fri, 27 Jan 2012 10:51:35 GMT
Hello Rohit,

I would try to write most objects in a Hadoop Sequence file or a MapFile 
and store the index/byte offeset in HBase.

When reading: open the file seek() to the position and start reading the 
key:value. I don't think that using toByteArray() is good because, I 
think, you are creating a copy of the object in memory. If it's big you 
will end up with two instances of them. Try to stream the object 
directly to disk.

I don't know if 5mb is good or not, I hope someone can shed some light.

If the objects are changing: append to the SequenceFile and update the 
reference in HBase. From time to time run a MR job that cleans the file.

You can use ZooKeeper to coordinate writing to many Sequence Files.

If you go this way, please post your results.


Pe 27.01.2012 10:42, Rohit Kelkar a scris:
> Hi,
> I am using hbase to store java objects. The objects implement the
> Writable interface. The size of objects to be stored in each row
> ranges from a few kb to ~50 Mb. The strategy that I am planning to use
> is
> if object size<  5Mb
> store it in hbase
> else
> store it on hdfs and insert its hdfs location in hbase
> While storing the objects I am using
> WritableUtils.toByteArray(myObject) method. Can I use the
> WritableUtils.toByteArray(myObject).length to determine if the object
> should go in hbase or hdfs? Is this an acceptable strategy? Is the 5
> MB limit a safe enough threshold?
> - Rohit Kelkar

Ioan Eugen Stan

View raw message