lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search Server
Date Fri, 21 Aug 2009 16:27:51 GMT
Ok, some improvement; "Faceting" as an end-user interface feature (or may be
"Filtering"?):


A. Faceting (for further filtering)
1. We are counting "facets"
2. Sorting by "counts" in descending order
3. Presenting top-N to user for possible filtering/narrowing search results

B.
"Simplified Lucene" (with default operator "AND"):
1. For each term, find DocSet
2. Calculate DocSet intersections


If we can avoid calculating "counts" for facets, and sorting by counts...
Just list of related filters to narrow search results...



P.S.
Faceting on "country" field with 10 possible values still takes 20-30
seconds for a query id:[* TO *] (100 mlns docs), although obviously it can
use FilterCache without any calcs!



Fuad Efendi
==================================
http://www.linkedin.com/in/liferay
http://www.tokenizer.org
http://www.casaGURU.com
==================================



-----Original Message-----
From: Fuad Efendi [mailto:fuad@efendi.ca] 
Sent: August-21-09 11:42 AM
To: solr-user@lucene.apache.org; yonik@lucidimagination.com
Subject: RE: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search
Server

>actually a hybrid that goes back to DocSet intersections when it's more
efficient

I noticed that too when I played with it, for large query results DocSet
intersections are de-facto standard; but when "faceting" started CNET had
only 400,000 documents :) 
Nowadays even 2-3 seconds response time is bad... may be storing all users'
queries and executing some tasks on background (storing "facets" in a
database similar to heavy warehouse, predicting facet counts depending on
query terms and domain analysis, and etc)?


On Fri, Aug 21, 2009 at 11:25 AM, Fuad Efendi<fuad@efendi.ca> wrote:
> I was joking [off-topic]; "faceting" as a DocSet intersections' replaced
by
> trivial term count calcs which is extremely faster in some (if not all)
use
> cases, including possibly even NON-tokenized (with standard faceting we
can
> use FilterCache)...

One size does not fit all.  The enum method is not outdated or
deprecated, and still works better in some scenarios.  The new
faceting code is actually a hybrid that goes back to DocSet
intersections when it's more efficient.

-Yonik
http://www.lucidimagination.com





Mime
View raw message