lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Re: Lucene Challenge - sum, count, avg, etc.
Date Thu, 01 Apr 2010 02:23:04 GMT
Hi Mike,

I'm sure there are better options, but one thing you could do is per- 
compute totals for different date resolutions. Depending on the number  
of unique affiliate IDs, this might work.

E.g. pre-calculate sums by day & by week (and maybe by month) for each  
affiliate id, and then turn the query into a set of affiliate_id x  
date range queries. Something like:

affiliate_id:<value> and (day:59 or day:60 or day:61 or week:10 or  
week:11 or week:12 or day:86 or day:87...)

-- Ken

On Mar 31, 2010, at 6:17pm, Michel Nadeau wrote:

> Hi,
> We're currently in the process of switching many of our screens from  
> to Lucene because MySQL simply dies because we have too much data  
> and it's
> becoming too long to generate the stats we need.
> So here's one MySQL query that we use to find out our Top 10  
> Affiliates :
> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY  
> affialiate_id
> ORDER BY total_sales DESC LIMIT 10;
> We currently have our "sales" index, containing all sales and all  
> fields -
> and it's one big index (over 10M records). We could fetch all  
> documents
> within the date range, loop them and add up the total_sales, but it  
> would be
> just crazy to do this all the time (we have a high volume of search).
> We made several tests with Solr (Facets, and even the beta  
> CollapseFields),
> but nothing is really helping us. We could pre-generate the  
> total_sales for
> all possible date ranges... but that would be quite crazy too as the  
> date
> range possibilities quickly become endless.
> So - is there any known way to efficiently do SUM(), COUNT() (and  
> even AVG()
> ) using Lucene/Solr/others? I also checked Bobo Browse but it  
> doesn't seem
> to offer what I need either.
> Thanks for any hints!!!
> - Mike

Ken Krugler
+1 530-210-6378
e l a s t i c   w e b   m i n i n g

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message