lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elmer van Chastelet <>
Subject Re: PhoneticFilterFactory 's inject parameter
Date Wed, 25 Apr 2012 15:02:45 GMT
Thanks for your suggestion Ian, but I just found out that if I replace 
the KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.

Just to test what happens, I created another field 'orig', using this 
analyzer KeywordLowered{
     tokenizer = KeywordTokenizer
     tokenfilter = LowerCaseFilter

Guess what.. exactly the same problem, also in Luke.
It finds no documents with for query:
While the term 'strange' is in the index for the field 'orig'.

Does anybody have a clue why documents are not matched when using the 
KeywordTokenizer? Remember that all queries and terms don't contain 
white spaces.

Thanks again.

On 04/25/2012 02:53 PM, Ian Lea wrote:
> You seem to be quietly going round in circles, by yourself!  I suggest
> a small self-contained program/test case with a RAM index created from
> scratch.  You can then experiment with inject on or off and if you
> still can't figure it out, post the code and hopefully someone will be
> able to help you make sense of it.
> Make sure you tell us what version of Lucene you are using.  If not
> the latest, wouldn't hurt to try with the latest.
> --
> Ian.
> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
> <>  wrote:
>> I keep replying to myself, it all gets a bit confusing.
>> The problem still exists and I don't understand why, and why it worked once.
>> I have the same behavior again as posted in my first mail:
>> - Inject parameter is set to true.
>> - The index has _no deleted documents_ and is optimized.
>> - The term 'compete' is in there.
>> - If I ask Luke to show all docs for term 'compete' it shows me the one and
>> only document that represents this word. But...
>> - If I perform the query 'value:compete' in luke again, it says there are no
>> results.
>> Here is the index I'm currently using. It contains various fields for the
>> available phonetic filter encoders:
>> Can somebody explain this behavior? What's the real use of the inject
>> parameter of the PhoneticFilterFactory?
>> Thanks in advance.
>> -Elmer
>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>> Problem solved. Long story short: for some reason I had deleted documents
>>> in the index and the non-deleted documents used the phonetic filter with
>>> inject set to false.
>>> Works fine now :)
>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>> Hi all,
>>>> (scroll to bottom for question)
>>>> I was setting up a simple web app to play around with phonetic filters.
>>>> The idea is simple, I just create a document for each word in the English
>>>> dictionary, each document containing a single search field holding the value
>>>> after it is preprocessed using the following analyzer def (in our own dsl
>>>> syntax, which gets transformed to java):
>>>> analyzer soundslike{
>>>>     tokenizer = KeywordTokenizer
>>>>     tokenfilter = LowerCaseFilter
>>>>     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
>>>> }
>>>> I can run the web app and I get results that indeed (in some way) sound
>>>> like the original query term.
>>>> But what confuses me is the ranking of the results, knowing that I set
>>>> the inject param to true. If I search for the query term 'compete', the
>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I expect
>>>> the word 'compete' to be ranked highest in the list than any other word....
>>>> but this wasn't the case.
>>>> Looking further at the explanation of results, I saw that the term
>>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>>> encoding seems affect the ranking:
>>>>       o 4.368826 = (MATCH) sum of:
>>>>           + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>>               # 0.52838135 = queryWeight(value:KMPT), product of:
>>>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>                   * 0.063904315 = queryNorm
>>>>               # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>>                 product of:
>>>>                   * 1.0 = tf(termFreq(value:KMPT)=1)
>>>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>                   * 1.0 = fieldNorm(field=value, doc=3174)
>>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>>> documents tab, and started iterating over some terms for the field 'value'
>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab opens
>>>> and it displays the one and only document holding this value (i.e. the
>>>> document representing the word 'compete'). It shows the query:
>>>> 'value:compete '. Then, when I hit the search button again (query is still
>>>> 'value:compete '), it says that there are no results !?
>>>> Probably, the 'Show All Docs' button does something different than
>>>> performing a query using the search tab in Luke.
>>>> Q: Can somebody explain why the injected original terms seem to get
>>>> ignored at query time? Or may it be related to the name of the search field
>>>> ('value'), or something else?
>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>> -Elmer
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message