hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Question regarding maximum row size.
Date Fri, 28 Jun 2019 04:55:07 GMT
On Wed, Jun 26, 2019 at 1:08 PM Vitaliy Semochkin <vitaliy.se@gmail.com>
wrote:

> Hi,
>
> I have an analytical report that would be very easy to build
> if I could store thousands of cells in one row each cell storing about
> 2kb of information.
> I don't need those rows to be stored in any cache, because they will
> be used only occasionally for analytical reports in Flink.
>
> The question is, what is the biggest size of a row hbase can handle?
> Should I store 2kb rows as MOBs or regular format is ok?
>
> There are old articles that say that large rows, i.e. rows which total
> size is large than 10mb, can affect hbase performance,
> is this statement still valid for the modern hbase versions?
> What is the largest row size  hbase handle theses days without having
> issues with performance?
> Is it possible to read a row so that it's whole content is not read
> into memory (e.g I would like to read row's content cell by cell)?
>
>
See
https://hbase.apache.org/2.0/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAllowPartialResults-boolean-
It speaks to your question. See the 'See Also:' on this method too.

Only works for Scan. Doesn't work if you Get a row (You could Scan one row
only if you need the above partial result).

HBase has no 'streaming' API that would allow you return a Cell-at-a-time
so big rows are a problem if you don't do the above partial. The big row is
materialized serverside in memory and then again client-side. 10MB is a
conservative upper bound.

2kb Cells should work nicely -- even if a few thousand... especially if you
can use partial.

S



> Best Regards
> Vitaliy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message