hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Re: question on hfile size upper limit
Date Fri, 21 Jun 2019 05:03:53 GMT
Based on what u pasted as the config
"<property>
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in
two.</description>
  </property>"

I can say the issue is the version of HBase.

Older HBase versions had this behave what u said.  When a file under a
region's CF grow above the max limit, the region will split.    The reason
why the check was like that is we any way try to major compact files under
a CF into one large file.  So the check based on larger file was ok/

This way is changed later and we start checking the sum of all files under
a region:cf.  Am not sure which version introduced this.   This became a
need when we supported feature like Date Tiered Compaction/ Stripe
Compaction.

So for you to have the required behave, try upgrade to a newer version.

Anoop


On Thu, Jun 20, 2019 at 9:55 PM Jean-Marc Spaggiari <jean-marc@spaggiari.org>
wrote:

> Hi,
>
> Just updating what I said (Thanks Anoop for the warning). I took the
> assumption that you have a single CF... The maxfilesize is per CF, not per
> region. If you have a single CF, then it become the same as per region, but
> a region will split whenever one of the CFs reaches the limit.
>
> HBase will not split a single row. So if you have a single row that grows
> bigger than the maxfilesize, the region will keep growing. You need to
> assess this risk when you do your table design and avoid it. It will not
> split even if there is millions of column qualifiers. A region is defines
> by a start row and a stop row. Therefore a single row can belong only to a
> single region.
>
> JMS
>
> Le jeu. 20 juin 2019 à 05:00, Roshan <jlks511@gmail.com> a écrit :
>
> > Hi,
> >
> > If the single rowkey in the table exceeds the size of defined
> > hbase.hregion.max.filesize, whether the region will split or not. In this
> > case, what are the performance issues we face in the Cluster?
> >
> > If the rowkey (belongs to single columnfamily) has different Column
> > qualifier also, the Hfile will not split?
> >
> >
> >
> > On Thu, 20 Jun 2019 at 11:38, wangyongqiang0617@163.com <
> > wangyongqiang0617@163.com> wrote:
> >
> > > this conf:
> > >   <property>
> > >     <name>hbase.hregion.max.filesize</name>
> > >     <value>10737418240</value>
> > >     <description>
> > >     Maximum HStoreFile size. If any one of a column families'
> HStoreFiles
> > > has
> > >     grown to exceed this value, the hosting HRegion is split in
> > > two.</description>
> > >   </property>
> > >
> > >
> > >
> > >
> > >
> > > wangyongqiang0617@163.com
> > >
> > > From: Jean-Marc Spaggiari
> > > Date: 2019-06-19 06:52
> > > To: user
> > > Subject: Re: question on hfile size upper limit
> > > Hi,
> > >
> > > Can you please confirm which parameter you are talking about? The
> default
> > > HBase setting is to limit the size per region (10GB by default), and
> not
> > by
> > > HFiles. This can be configured at the HBase lever, or at the table
> level.
> > >
> > > HTH,
> > >
> > > JMS
> > >
> > > Le mar. 18 juin 2019 à 11:32, wangyongqiang0617@163.com <
> > > wangyongqiang0617@163.com> a écrit :
> > >
> > > > we set size upper limit for hfile, but not region
> > > > so region has different actural size, leading to some analysis task
> has
> > > > different input size
> > > >
> > > > can we set size limit on region
> > > >
> > > >
> > > >
> > > >
> > > > wangyongqiang0617@163.com
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message