hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry He <jerry...@gmail.com>
Subject Re: BulkLoad 200GB table with one region. Is it OK?
Date Thu, 02 Oct 2014 19:49:24 GMT
The reference files will be rewritten during compaction, which normally
happens right after splits.

You did not mention if your 200gb data is one fileļ¼Œor many hfiles.

Jerry
On Oct 2, 2014 12:26 PM, "Serega Sheypak" <serega.sheypak@gmail.com> wrote:

> Sorry, massive IO.
> This table is read-only. So hbase should just place reference files, why
> Hbase would rewrite the files?
>
> 2014-10-02 23:24 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
>
> > Hi!
> > http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
> > Says that splitting is just a placing 'reference' file.
> > Why there sould be massive splitting?
> >
> > 2014-10-02 23:08 GMT+04:00 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> >:
> >
> >> Hi Serega,
> >>
> >> Bulk load just "push" the file into an HBase region, so there should not
> >> be
> >> any issue. Split however might take some time because HBase will have to
> >> split it again and again util it become small enough. So if you max file
> >> size is 10GB, it will split it to 100GB then 50GB then 25GB then 12GB
> then
> >> 6GB... Each time, everything will be re-written. a LOT of wasted IOs.
> >>
> >> So response is: Yes, HBase can handle BUT it's not a good practice.
> Better
> >> to split the table before and generate the bulk based on the splited
> >> regions. Also, it might affect the others tables and the performances
> >> because HBase will have to do massive IOs, which at the end might impact
> >> the performances.
> >>
> >> JM
> >>
> >> 2014-10-02 15:03 GMT-04:00 Serega Sheypak <serega.sheypak@gmail.com>:
> >>
> >> > Hi, I'm doing HBase bulk load to an empty table.
> >> > Input data size is 200GB
> >> > Is it OK to load data into one default region and then wait while
> HBase
> >> > splits 200GB region?
> >> >
> >> > I don't have any SLA for initial load. I can wait unitl HBase splits
> >> > initial load files.
> >> > This table is READ only.
> >> >
> >> > The only conideration is not affect others tables and do not cause
> HBase
> >> > cluster degradation.
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message