hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Q <bill.q....@gmail.com>
Subject HBase load distribution vs. scan efficiency
Date Sun, 19 Jan 2014 18:39:08 GMT
I am designing a schema to host some large volume of data over HBase. We
collect daily trading data for some markets. And we run a moving window
analysis to make predictions based on a two weeks window.

Since everybody is going to pull the latest two weeks data every day, if we
put the date in the lead positions of the Key, we will have some hot
regions. So, we can use bucketing (date to mode bucket number) approach to
deal with this situation. However, if we have 200 buckets, we need to run
200 scans to extract all the data in the last two weeks.

My questions are:
1. What happens when each scan return the result? Will the scan result be
sent to a sink  like place that collects and concatenate all the scan
2. Why having 200 scans might be a bad thing compared to have only 10
3. Any suggestions to the design?

Many thanks.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message