lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Bigham <dani...@wolfram.com>
Subject Query Expansion for Synonyms
Date Thu, 28 Apr 2016 15:26:15 GMT
I'm investigating various ways of supporting synonyms in Lucene.

One such approach that looks potentially interesting is to do a kind of 
"query expansion".

For example, if the user searches for "us 1888", one might expand the 
query as follows:

     SpanNearQuery query =
     new SpanNearQuery(
         new SpanQuery[]
         {
             new SpanOrQuery(
                 new SpanTermQuery(new Term("Plaintext", "us")),
                 new SpanNearQuery(
                     new SpanQuery[]
                     {
                         new SpanTermQuery(new Term("Plaintext", "united")),
                         new SpanTermQuery(new Term("Plaintext", "states"))
                     },
                     0,
                     true
                 )
             ),
             new SpanTermQuery(new Term("Plaintext", "1888"))
         },
         0,
         true
     );

A couple of questions:

- Is this approach in use within the community?
- Are there "gotchas" with this approach that make it undesirable?

I've done a few quick tests wrt query performance on a test index and 
found that a query can indeed take 10x longer if enough synonyms are 
used, but if the baseline search time is around 1 ms, then 10 ms is 
still plently fast enough. (that said, my test was on a 70 MB index, so 
my 10 ms might turn into something nasty with a 7 GB index)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message