lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: How best to compare tow sentences
Date Fri, 05 Dec 2014 08:58:34 GMT
On 05/12/2014 01:25, Chris Hostetter wrote:
> : For a number of years I've been doing this for some time by creating a
> : RAMDirectory, creating a document for one of the sentence and then doing  a
> : search using the other sentence and seeing if we get a good match. This has
> : worked reasonably well but since improving the performance of other parts of
> : the application this part has become a performance bottleneck, not that
> : suprising as Im creating all these objects just for a one off search, and I
> : have to do this for many sentence pairs.
> i'm not an academic, and i don't want to undermine the very goal specific
> advice given by other folks in this thread - but i do wnat ot point out
> that if you are doing *lots* of comparisons like this, then building a
> RAMDirectory for each and every "known" song title to comare with each and
> every "new" song title is already a super inefficient use of lucene.
> if instead you built  and *kept* a lucene index containing all known song
> titles (one per doc) and then queried it for each "new" song title that
> came in you'd probably find yourself with a much more efficient solution
> w/o needing to spend a lot of time investigating new algorithms.
Thanks Chris

As I said in the original post i was using RAMDirectory but have stepped 
away from that because clearly it is very inefficient. But I also I dont 
have access to all the song titles, and after comparing one song title 
with another I no longer need to keep the song titles, and if I did it 
would eventually end up consuming too much memory so I really do need to 
abandon lucene for this task and just find the best way to compare two 
strings with each other. It already works to a degree using 
CosineSimilarity I was just looking to improve it.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message