lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: FuzzyQuery
Date Wed, 12 Jun 2019 15:36:15 GMT
Tomoko,-

  Thank You for Your suggestions. i am trying to understand it and i 
thought i did :)

but it does not work with FuzzyQuery when i used with a *single* large 
TextField like street=...value... city=...value... region=...value... 
country=...value... (with or without quotes for the values)

What i knew about Lucene fuzzy queries are not holding now with this 
Textfield form. That is why i suspected of a bug.

1. Yes, i saw and have a solid proof on that now.

2. yes but FuzzyQuery takes quotes as they are as they are escaped and 
it is not analyzed.

Stuffing into one textfield vs having separate fields should only affect 
probably the performance but not the outcome in my case.
But, i have been thinking about this and maybe it is the way to go in 
this case.

mY CONTENT field has street names in mixed case and city, region country 
names in UPPERCASE. Can this be a problem?
i thought index stored them in lowercase since i am using StandardAnalyzer.

CONTENT field also has full textfield string with street=... city=... 
region=... country=... (here all values are UPPERCASE).

Why cant the index find the names via FuzzyQuery? i tried both 
FuzzyQuery and Query builder as i showed before.

The last advice in Your previous email would nicely go outside the 
parantheses since it might be very critical :) :) :)

Best regards


On 6/12/19 12:17 AM, Tomoko Uchida wrote:
> I'd suggest to correctly understand the way a software works before
> suspecting its bug :-)
>
> I guess you may miss two points:
>
> 1. the standard analyzer (standard tokenizer) breaks words by double
> quote (U+0022) so quotes are not indexed or searched at all if you are
> using standard analyzer. (That is the reason you have same results
> with or without quotes.)
> See: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
> and https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=
>
> 2. double quote has special meaning (it's interpreted as phrase query)
> with the built-in query parser so you need to escape it if you want to
> search double quotes itself.
> See: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=
>
> (My advice would be to create separate fields for each key value pairs
> instead of stuffing all pairs into one text field, if you need to
> search them separately.)
>
> 2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
>> i can say that quotes is not the issue with index as it still results in
>> same results with quotes or without quotes.
>>
>> i am starting to feel that this might be a bug maybe??
>>
>> Best regards
>>
>>
>> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
>>> Somehow " is causing an issue as this should return street with MAIN:
>>>
>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united
>>> states"] -> this was with fuzzyquery on MAINS
>>>
>>> Best regards
>>>
>>>
>>> On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>> +contentDFLT:"country united states", contentDFLT:street
>>>> contentDFLT:mains]
>>>>
>>>> QueeryParser chops it into two pieces from
>>>> parser.parser("street=\"MAINS\"");
>>>>
>>>> Index has a TextField named contentDFLT the following data :
>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
>>>> HAMPSHIRE" country="UNITED STATES"
>>>>
>>>>
>>>> When i set street=\"MAINS~\" with parser:
>>>> i get the following
>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>> +contentDFLT:"country united states", contentDFLT:street
>>>> contentDFLT:mains]
>>>>
>>>> probably " quotations are messing this up as You were saying...
>>>> Best regards
>>>>
>>>>
>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
>>>>> Or, " (double quotation) in your query string may affect query parsing.
>>>>>
>>>>> When I parse this string by classic query parser (lucene 8.1),
>>>>> street="MAINS~"
>>>>> parsed (raw) query is
>>>>> text:street text:mains
>>>>> (I set the default search field to "text", so text:xxxx is appeared
>>>>> here.)
>>>>>
>>>>> Query parsing is a complex process, so it would be good to check
>>>>> parsed raw query string especially when you have (reserved) special
>>>>> characters in your query...
>>>>>
>>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>>>> Hi,
>>>>>>
>>>>>> I noticed one small thing in your previous mail.
>>>>>>
>>>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
>>>>>> which is good.
>>>>>>
>>>>>> To specify a search field, ":" (colon) should be used instead of
"=".
>>>>>> See the query parser documentation:
>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=
>>>>>>
>>>>>>
>>>>>> I'm not sure this is related to your problem.
>>>>>>
>>>>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
>>>>>>>
>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>>>> phraseAnalyzer) ;
>>>>>>>            Query q1 = null;
>>>>>>>            try {
>>>>>>>                q1 = parser.parse("MAIN");
>>>>>>>            } catch (ParseException e) {
>>>>>>>
>>>>>>>                e.printStackTrace();
>>>>>>>            }
>>>>>>>            booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
>>>>>>>
>>>>>>> testQuerySearch2 Time to compute: 0 seconds
>>>>>>> Number of results: 1775
>>>>>>> Name: Main St
>>>>>>> Score: 37.20959
>>>>>>> ID: 12681979
>>>>>>> Country Code: US
>>>>>>> Coordinates: 42.76416, -71.46681
>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>>
>>>>>>> Name: Main St
>>>>>>> Score: 37.20959
>>>>>>> ID: 12681977
>>>>>>> Country Code: US
>>>>>>> Coordinates: 42.747, -71.45957
>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>>
>>>>>>> Name: Main St
>>>>>>> Score: 37.20959
>>>>>>> ID: 12681978
>>>>>>> Country Code: US
>>>>>>> Coordinates: 42.73492, -71.44951
>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>>
>>>>>>>     when i use q1 = parser.parse("street=\"MAIN\""); i get same
>>>>>>> results
>>>>>>> which is good.
>>>>>>>
>>>>>>> But when i switch to MAINS~ then fuzzy query does not work.
>>>>>>>
>>>>>>>
>>>>>>> i need to say something with the q1 only in the booleanquery:
>>>>>>> it tries to match the MAIN in street, city, region and country
>>>>>>> which are
>>>>>>> in a single TextField field.
>>>>>>> But i dont want this. that is why i need to street="..." etc
when
>>>>>>> searching.
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> just for the basic verification, can you find the document
without
>>>>>>>> fuzzy query? I mean, does this query work for you?
>>>>>>>>
>>>>>>>> Query query = parser.parse("MAIN");
>>>>>>>>
>>>>>>>> Tomoko
>>>>>>>>
>>>>>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
>>>>>>>>> why cant the second set not work at all?
>>>>>>>>>
>>>>>>>>> it is indexed as Textfield like street="..." city="..."
etc.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
>>>>>>>>>> i dont know how to use Fuzzyquery with queryparser
but probably
>>>>>>>>>> You
>>>>>>>>>> are suggesting
>>>>>>>>>>
>>>>>>>>>> QueryParser parser = new QueryParser(field, analyzer)
;
>>>>>>>>>> Query query = parser.parse("MAINS~2");
>>>>>>>>>>
>>>>>>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
>>>>>>>>>>
>>>>>>>>>> am i right?
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
>>>>>>>>>>> I would suggest using a QueryParser for your
fuzzy query before
>>>>>>>>>>> adding it to the Boolean query. This should weed
out any case
>>>>>>>>>>> issues.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
>>>>>>>>>>> <mailto:baris.kazar@oracle.com>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>        BooleanQuery.Builder booleanQuery = new
>>>>>>>>>>> BooleanQuery.Builder();
>>>>>>>>>>>
>>>>>>>>>>>        //First set
>>>>>>>>>>>
>>>>>>>>>>>                booleanQuery.add(new FuzzyQuery(new
>>>>>>>>>>>        org.apache.lucene.index.Term(field, "MAINS")),
>>>>>>>>>>>        BooleanClause.Occur.SHOULD);
>>>>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer,
field,
>>>>>>>>>>>        "NASHUA"), BooleanClause.Occur.MUST);
>>>>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer,
field,
>>>>>>>>>>>        "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer,
field,
>>>>>>>>>>>        "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>>>>>>
>>>>>>>>>>>        // Second set
>>>>>>>>>>>                 //booleanQuery.add(new FuzzyQuery(new
>>>>>>>>>>>        org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
>>>>>>>>>>>        BooleanClause.Occur.SHOULD);
>>>>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>>>>>>
>>>>>>>>>>>        field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>>>>>>
>>>>>>>>>>>        field, "region=\"NEW HAMPSHIRE\""),
>>>>>>>>>>> BooleanClause.Occur.MUST);
>>>>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>>>>>>
>>>>>>>>>>>        field, "country=\"UNITED STATES\""),
>>>>>>>>>>> BooleanClause.Occur.MUST);
>>>>>>>>>>>
>>>>>>>>>>>        The first set brings also street with
Nashua name.
>>>>>>>>>>> (NASHUA).
>>>>>>>>>>>
>>>>>>>>>>>        so, to prevent that and since i also indexed
with
>>>>>>>>>>> street="..."
>>>>>>>>>>>        city="..." i did the second set but it
does not bring
>>>>>>>>>>> anything.
>>>>>>>>>>>
>>>>>>>>>>>        createPhraseQuery builds a Phrasequery
with one term
>>>>>>>>>>> equal to the
>>>>>>>>>>>        string
>>>>>>>>>>>        in the call.
>>>>>>>>>>>
>>>>>>>>>>>        Best regards
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        On 6/10/19 10:47 AM, baris.kazar@oracle.com
>>>>>>>>>>>        <mailto:baris.kazar@oracle.com>
wrote:
>>>>>>>>>>>        > How do i check how it is indexed?
lowecase or uppercase?
>>>>>>>>>>>        >
>>>>>>>>>>>        > only way is now to by testing.
>>>>>>>>>>>        >
>>>>>>>>>>>        > i am using standardanalyzer.
>>>>>>>>>>>        >
>>>>>>>>>>>        > Best regards
>>>>>>>>>>>        >
>>>>>>>>>>>        >
>>>>>>>>>>>        > On 6/9/19 11:57 AM, Atri Sharma wrote:
>>>>>>>>>>>        >> On Sun, Jun 9, 2019 at 8:53 PM
Tomoko Uchida
>>>>>>>>>>>        >> <tomoko.uchida.1111@gmail.com
>>>>>>>>>>> <mailto:tomoko.uchida.1111@gmail.com>>
wrote:
>>>>>>>>>>>        >>> Hi,
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> What analyzer do you use
for the text field? Is the
>>>>>>>>>>> term "Main"
>>>>>>>>>>>        >>> correctly indexed?
>>>>>>>>>>>        >> Agreed. Also, it would be good
if you could post your
>>>>>>>>>>> actual
>>>>>>>>>>> code.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> What analyzer are you using?
If you are using
>>>>>>>>>>> StandardAnalyzer,
>>>>>>>>>>>        then
>>>>>>>>>>>        >> all of your terms while indexing
will be lowercased,
>>>>>>>>>>> AFAIK, but
>>>>>>>>>>>        your
>>>>>>>>>>>        >> query will not be analyzed until
you run a
>>>>>>>>>>> QueryParser on it.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> Atri
>>>>>>>>>>>        >>
>>>>>>>>>>>        >
>>>>>>>>>>>        >
>>>>>>>>>>>        >
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>>        > To unsubscribe, e-mail:
>>>>>>>>>>> java-user-unsubscribe@lucene.apache.org
>>>>>>>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
>>>>>>>>>>>        > For additional commands, e-mail:
>>>>>>>>>>>        java-user-help@lucene.apache.org
>>>>>>>>>>> <mailto:java-user-help@lucene.apache.org>
>>>>>>>>>>>        >
>>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message