lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: Ignoring “de la” at index or search time
Date Fri, 01 Mar 2019 15:29:34 GMT
this did not work, any suggestions please?

QueryParser parser = new QueryParser(columns[0], analyzer) ;
Query query5 = parser.parse(q+"~");

i cant set the slop value like

parser.setPhraseSlop(slopValue);


i still see the query printed as with value 2:

Query5:<columns[0]>:<term1> <columns[0]>:<term2>~2


Best regards



On 2/25/19 4:37 PM, baris.kazar@oracle.com wrote:
> Ok, found answer to this question:
>
> parser.setPhraseSlop(slopValue);
>
> Thanks
>
>
> On 2/25/19 11:43 AM, baris.kazar@oracle.com wrote:
>> Thanks Erick, that was very helplful.
>>
>> Now, i see what you mean by at the begining of this thread: stopwords 
>> are less of a concern
>>
>>
>>
>> Now, may i ask the following related question?
>>
>> QueryParser parser = new QueryParser(columns[0], analyzer) ;
>> Query query5 = parser.parse(q+"~");
>>
>> i see the query  above prints out like:
>>
>> Query5:<columns[0]>:<term1> <columns[0]>:<term2>~2
>>
>> where q is "term1 term2" string.
>>
>> So, in this case i did PhraseQuery indirectly and it seems default 
>> slop is 2, am i right?
>> Is there a way to change slop in this case? i dont want to change 
>> this to PhraseQuery for now.
>>
>> Thanks
>>
>> On 2/24/19 9:21 PM, Erick Erickson wrote:
>>> Case 1. Stopwords are irrelevant. If you sreach field:(a AND b) you're
>>> asking if both appear in the field, and that's the only question. It
>>> doesn't matter what other words are in the field. It doesn't matter 
>>> whether
>>> they're close to each other.
>>>
>>> Case 2. Yep.
>>>
>>> On Sun, Feb 24, 2019, 17:02 baris.kazar <baris.kazar@oracle.com> wrote:
>>>
>>>> There is PhraseQuery, too, but lets consider two cases:
>>>>
>>>> case1: that PhraseQuery is not being used:
>>>> then should i add to standard filter’s stopwords also the french 
>>>> stopwords
>>>> both at index & search times? can i just add them at search time 
>>>> and keep
>>>> old friends index as it is?
>>>>
>>>> case2: that PhraseQuery being used:
>>>> i guess i need to play with the “slops” and stopwords in this case 
>>>> will
>>>> not help, right?
>>>>
>>>> Thanks
>>>>
>>>>> On Feb 24, 2019, at 2:25 PM, baris.kazar <baris.kazar@oracle.com>

>>>>> wrote:
>>>>>
>>>>> That is not what i am looking for. Thanks.
>>>>>
>>>>> c b search string finds
>>>>> a b
>>>>> but how cant find
>>>>> a de la b
>>>>> so i will try french stopwords.
>>>>> Doing that i am using 8 queries like the ones i mentioned.
>>>>> Best
>>>>>
>>>>>> On Feb 24, 2019, at 1:19 PM, Erick Erickson 
>>>>>> <erickerickson@gmail.com>
>>>> wrote:
>>>>>> Phrase search is looking for words next to each other. A phrase 
>>>>>> search
>>>> on the text “my dog has fleas” would succeed for “my dog” or “has

>>>> fleas”
>>>> but not “my fleas” since the words are not right next to each 
>>>> other. “my
>>>> fleas”~3 would succeed because the “~3” indicates that the words 
>>>> can have
>>>> intervening terms.
>>>>>> Searching (dog AND fleas) would match no matter how many words were
>>>> between the two.
>>>>>> If you’re unclear about what phrase search .vs. non-phrase search
>>>> means, some background research/ self-education are strongly 
>>>> recommended,
>>>> such basic understanding of search is pretty much assumed.
>>>>>> Best,
>>>>>> Erick
>>>>>>
>>>>>>> On Feb 24, 2019, at 9:25 AM, baris.kazar <baris.kazar@oracle.com>
>>>> wrote:
>>>>>>> i guess so
>>>>>>> what is phrase search?
>>>>>>> c b is searched do you expect a de la b?
>>>>>>> Thanks
>>>>>>>
>>>>>>>> On Feb 24, 2019, at 10:49 AM, Erick Erickson 
>>>>>>>> <erickerickson@gmail.com>
>>>> wrote:
>>>>>>>> Not sure we’re talking about the same thing. I was talking
>>>> specifically about _phrase_ searches. If all you want is the clause 
>>>> you
>>>> just said, phrases are not involved at all and the presence or 
>>>> absence of
>>>> intervening words is totally unnecessary. This assumes your field type
>>>> tokenizes the input similar to the text_general field in the examples.
>>>> Specifically _not_ “string” fields or fields that use 
>>>> KeywordTokenizer.
>>>>>>>> q=name:(a AND b) OR name:b
>>>>>>>>
>>>>>>>> for instance. With a query like that it doesn’t matter
in the 
>>>>>>>> least
>>>> whether there are, or are not any words between “a” and “b”.
>>>>>>>> All that may be obvious to you, but when I read your latest

>>>>>>>> e-mail it
>>>> occurred to me that we might not be talking about the same thing.
>>>>>>>> Best,
>>>>>>>> Erick
>>>>>>>>
>>>>>>>>> On Feb 23, 2019, at 7:33 PM, baris.kazar <baris.kazar@oracle.com>
>>>> wrote:
>>>>>>>>> In this case search string is c b
>>>>>>>>> and then search query has 8 combos
>>>>>>>>> including two cases with c b ~ which means find all containing

>>>>>>>>> c And
>>>> b and c Or b ( two separate queries having ~ )
>>>>>>>>> and then i can find a b but not a de la b without French

>>>>>>>>> stopwords.
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>> On Feb 23, 2019, at 6:52 PM, Erick Erickson <
>>>> erickerickson@gmail.com> wrote:
>>>>>>>>>> Lucene won’t ignore these unless you tell it to
via stopwords.
>>>>>>>>>>
>>>>>>>>>> This is a problem no matter how you look at it. If
you do put in
>>>> stopwords, the word _positions_ are retained. In your example,
>>>>>>>>>> word     position
>>>>>>>>>> a           1
>>>>>>>>>> de         2
>>>>>>>>>> la         3
>>>>>>>>>> b           4
>>>>>>>>>>
>>>>>>>>>> If you remove “de” and “la” via stopwords,
the positions are 
>>>>>>>>>> still:
>>>>>>>>>>
>>>>>>>>>> word     position
>>>>>>>>>> a           1
>>>>>>>>>> b           4
>>>>>>>>>>
>>>>>>>>>> So searching for “a b” would fail in the second
case unless you
>>>> included “slop” as
>>>>>>>>>> “a b”~2
>>>>>>>>>>
>>>>>>>>>> But let’s say you _do not_ have input with these
stopwords, 
>>>>>>>>>> just “a
>>>> b". The positions
>>>>>>>>>> will be 1 and 2 respectively. Here the user would
expect “a 
>>>>>>>>>> b” to
>>>> match this doc, but
>>>>>>>>>> not a doc with “a de la b” (unless they knew
a lot about 
>>>>>>>>>> search!).
>>>>>>>>>>
>>>>>>>>>> So maybe the right thing to do is let phrases have
slop as a 
>>>>>>>>>> matter
>>>> of course.
>>>>>>>>>> Best,
>>>>>>>>>> Erick
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Feb 23, 2019, at 11:07 AM, baris.kazar 
>>>>>>>>>>> <baris.kazar@oracle.com>
>>>> wrote:
>>>>>>>>>>> Thanks Erick there is a pattern i cant catch
in my results 
>>>>>>>>>>> such as:
>>>>>>>>>>> a de la b
>>>>>>>>>>> i catch “a b” though.
>>>>>>>>>>> I though Lucene might ignore those automatically
while creating
>>>> index.
>>>>>>>>>>>
>>>>>>>>>>>> On Feb 23, 2019, at 12:29 PM, Erick Erickson
<
>>>> erickerickson@gmail.com> wrote:
>>>>>>>>>>>> Use stopwords, although it's becoming less
of a concern, 
>>>>>>>>>>>> why do
>>>> you think
>>>>>>>>>>>> you need to?
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 23, 2019, 08:42 baris.kazar

>>>>>>>>>>>>> <baris.kazar@oracle.com>
>>>> wrote:
>>>>>>>>>>>>> Hi,-
>>>>>>>>>>>>> What is the (most efficient) way to
>>>>>>>>>>>>> ignore “de la” kinda connectors
>>>>>>>>>>>>> in a string at index or search time?
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: 
>>>>>>>>>>>>> java-user-unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:
>>>> java-user-help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: 
>>>>>>>>>>> java-user-help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: 
>>>>>>>>>> java-user-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------

>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------

>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------

>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message