hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Standalone to distributed migration
Date Mon, 12 Oct 2009 15:42:33 GMT
(Mendeley looks great).

See below:

On Mon, Oct 12, 2009 at 1:46 AM, Dan Harvey <dan.harvey@mendeley.com> wrote:

> One question I have for this is if we start using the standalone operation
> on a single server initially whilst we setup and test ours systems, is
> possible to migrate from this to the distributed system without having to
> rebuild the data store?

Should just be a matter of copy from local disk up to hdfs.  It wouldn't be
hard to confirm this for yourself.

Running a single instance of hbase in anything but a test setup is not
really recommended.  Or rather, we've not spent any time on making sure this
sort of deploy is performant.

> A second question is more tying to understand the way in which to use
> HBase.
> If we have documents that have many authors, which themselves have a
> varying amount of metadata, how is a good approach to store this? From
> reading about HBase I see it could be done using a column family on the
> document for say author_name:, author_email: but if there are an unknown
> number of author properties this probably isn't the best way.. Would using
> a separate table be better to store the author data in?

How do you think you will be accessing the data?  Will you be doing lookups
on the attribute or by author or both?

> My last question is using Map/Reduce on top of HBase, is the Map/Reduce
> code
> still location aware for where the data is stored in HDFS or does using
> Map/Reduce create a larger I/O bottleneck than using HDFS normally?

The TableInputFormat in hbase passes the mapreduce framework the address of
running regionservers.  In our experience, the mapreduce framework will near
always run tasks on the tasktracker that is running on the same machine as
the regionserver hosting the task source region.

> If we choose to use HBase I hope to start being more active in the
> community
> here soon!

Let us know if there is anything we can do to help you with your evaluation.


> Thanks,
> Dan Harvey

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message