lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: lucene integration with relational database
Date Tue, 18 Jan 2005 20:33:42 GMT

: Thanks for your tips. I am trying to get a more thorough understanding
: why this would be better.

1) give serious consideration to just putting all of your data in lucene
for the purposes of searching.  the intial example mentioned employees,
and salaries and wanted to search for employees with certain names, and
salaries < $X ...lucene can do the "salaray < $X" using a RangeFilter.

2) assuming you *must* combine your lucene query with your SQL query...

When your goal is performance, I don't think you'll ever be able to
find a truely generic solution for all situations -- the specifics matter.

For example:

  a) is your goal specifically to discount lucene results that don't meet
     a criteria specified in your DB?
  b) do you care about having an accurate number of total matches, or do
     you only care about "filtering" out results?

depending on the answers, a fairly fast way to "eliminate" results is to
only worry about the page of results you are looking at.  Consider an
employee search application which displays 10 results per page.  first you
do a lucene search by name, then you want to throw out any employees whose
salary is below $X.  use the Hits object from the lucene search to get the
unique IDs for the first 10 employees (which uses a very small, fixed
amount of memory and time, regardless of how big your index/result is)
then do a lookup in your DB using a query built from those 10 IDs, ala:

   select ... from ... where ID in (1234, 5678 ... 7890)

...(which should also be very fast assuming your DB has a primary key on

if the 10 IDs all match your SQL query then you're done.  If N don't match
your query, then you need find the next N results from Hits that do; so
just repeat the steps above untill you've gotten 10 viable results.

(given good statistics on your data, you can virtually eliminate the need
to execute more then a few iterations ... if nothing else, you can use the
ratio or misses/hits from the first SQL query -- N of 10 didn't match --
to decide how big to make your second query to ensure you'll get N good


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message