lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamie <ja...@stimulussoft.com>
Subject Lucene TermsFilter lookup slow
Date Sat, 08 Aug 2015 12:32:34 GMT
Greetings

Our app primarily uses Lucene for its intended purpose i.e. to search 
across large amounts of unstructured text. However, recently our 
requirement expanded to perform look-ups on specific documents in the 
index based on associated custom defined unique keys. For our purposes, 
a unique key is the string representation of a 128 bit murmur hash, 
stored in a Lucene field named uid.  We are currently using the 
TermsFilter to lookup Documents in the Lucene index as follows:

List<Term> terms = new LinkedList<>();
             for (String id : ids) {
                 terms.add(new Term("uid", id));
}
TermsFilter idFilter = new TermsFilter(terms);
... search logic...

At any time we may need to lookup say a couple of thousand documents. 
Our problem is one of performance. On very large indexes with 30 million 
records or more, the lookup can be excruciatingly slow. At this stage, 
its not practical for us to move the data over to fit for purpose 
database, nor change the uid field to a numeric type. I fully appreciate 
the fact that Lucene is not designed to be a database, however, is there 
anything we can do to improve the performance of these look-ups?

Much appreciate

Jamie


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message