hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: bulk upload for multy column families
Date Wed, 05 Jan 2011 16:41:39 GMT
On Wed, Jan 5, 2011 at 8:29 AM, Oleg Ruchovets <oruchovets@gmail.com> wrote:
> Hi
>      I've read https://issues.apache.org/jira/browse/HBASE-1861 and read
> mail list's post regarding this issue.
>
> My questions are:
> 1) Am I understand correct that this features will be only supported
> in 0.92.0.

Yes.


> If yes when did you plan to release this version.

Soon after 0.90. (0.90 is about to put up its 4th release candidate).


 Is it
> complicated    to make changes supporting in order to support bulk loading
> to multiple column in  latest already released version?


Try it.  IIRC, the patch was not too intrusive.  Download the last
patch posted here: https://review.cloudera.org//r/1272/#review2044


> Simply  we are going to use it production and the most time consuming job is
> hbase insertion.
> Actually the time was  6 hours  to write ~5Gb ,  after some tuning
> (using compression  , writing blocks/buffers ) it takes ~2.5 hour but it is
> still a lot for us.
>

Thats a long time.  How many row inserts?  Where you think the time is
being spent?


> 2) Currently we are using hbase 0.20.3. And we want to upgrade. What version
> we should use?
>

Oh.  Ignore my advice above that suggests you try the patch against
your current hbase.  The patch won't apply to 0.20.x.  It would likely
work against 0.90.x.

You should at least update to 0.20.6.

0.90.0 should be out soon.  You might want to wait on that.


> 3) I've read as much as I found about bulk loading but didn't find any
> simple tutorial.
>

You've seen http://people.apache.org/~stack/hbase-0.90.0-candidate-2/docs/bulk-loads.html
(Bulk Upload was rewritten for 0.90.x).

St.Ack

Mime
View raw message