hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Multiple scans vs single scan with filters
Date Wed, 23 Feb 2011 22:40:30 GMT

Would be great if somebody can share thoughts/ideas/some numbers on the
following problem.

We have a reporting app. To fetch data for some chart/report we currently
use multiple scans, usually 10-50. We fetch about 100 records with each scan
which we use to construct a report.

I've revised data we store and code logic and see that we could really fetch
same data with single scan by specifying filters to filter out data which
doesn't fit the report params. In this case the scan range will be about
100-200K records from which after filtering we'd get the same records as we
do currently fetch with multiple scans.

So the question is: given these numbers (10-50 scans fetching 100 records
each VS 1 scan + filters on range of 100-200K records) will the optimization
I have in mind really improve performance? Unfortunately we don't have good
volume of data currently to perform tests on. May be someone can share
thoughts based solely on these numbers? Record size is about 1Kb.

Thank you!
Alex Baranau

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message