lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: FuzzyQuery- why is it ignored?
Date Thu, 13 Jun 2019 14:35:07 GMT
However, the index does not have MAINS but MAIN for the expected entry.

Best regards



On 6/13/19 10:33 AM, baris.kazar@oracle.com wrote:
> does it consider it as like plural word? :) :) :)
> That makes sense.
>
> Best regards
>
>
> On 6/13/19 10:31 AM, baris.kazar@oracle.com wrote:
>> Erick,
>>
>> Cool, could You give a simple example with my example please?
>>
>> Best regards
>>
>>
>>
>> On 6/13/19 10:12 AM, Erick Erickson wrote:
>>> Shot in the dark: stemming. Whenever I see a problem with something 
>>> ending in “s” (or “er” or “ing” or….) my first suspect is that

>>> stemming is turned on. In that case the token in the index that’s 
>>> actually searched on is somewhat different than you expect.
>>>
>>> The test is easy, just insure your fieldType contains no stemmers. 
>>> PorterStemmer is particularly aggressive, but for this case to test 
>>> I’d just remove all stemming, re-index and see if the results differ.
>>>
>>> Best,
>>> Erick
>>>
>>>> On Jun 13, 2019, at 7:26 AM, baris.kazar@oracle.com wrote:
>>>>
>>>> Tomoko,-
>>>>
>>>>   That is strange indeed.
>>>>
>>>> Something is wrong when i use mains but maink, mainl, mainr,mainq, 
>>>> maint all work ok any consonant at the end except s works in this 
>>>> case.
>>>>
>>>> Case #3 had +contentDFLT:mains~2 but not +contentDFLT:"mains~2".
>>>>
>>>> i am using fuzzy query with ~ from Query.builder and that is not 
>>>> PhraseQuery.
>>>>
>>>> Similarly FuzzyQuery with input "mains" (it has to be lowercase 
>>>> since it does not go through StandardAnalyzer) is also not 
>>>> PhraseQuery.
>>>>
>>>> can there be a clearer sample case for ComplexPhraseQuery please in 
>>>> the docs?
>>>>
>>>> did You also index "MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED 
>>>> STATES" the expected output in this case?
>>>>
>>>> Thanks for spending time on this, i would like to thank everyone.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 6/13/19 12:13 AM, Tomoko Uchida wrote:
>>>>> Hi,
>>>>>
>>>>>> Ok, i think only this very specific only "mains" has an issue.
>>>>> It looks strange to me. I did some test locally.
>>>>>
>>>>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>> UNITED STATES".
>>>>>
>>>>> 2a. This query string (just copied from your Case #3) worked 
>>>>> correctly
>>>>> for me as far as I can see.
>>>>> +contentDFLT:mains~2 +contentDFLT:"nashua",
>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united state"
>>>>>
>>>>> 2b. However this query string got no results.
>>>>> +contentDFLT:"mains~2", +contentDFLT:"nashua",
>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"
>>>>> It is an expected behaviour because the classic query parser does not
>>>>> support fuzzy query inside phrase query (as far as I know).
>>>>>
>>>>> I suspect you use fuzzy query operator (~) inside phrase query 
>>>>> ("), as
>>>>> the 2b case.
>>>>>
>>>>> FYI: there is a special parser for such complex phrase query.
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_complexPhrase_ComplexPhraseQueryParser.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=ZcXpaSlwS5DegX76mHTb_6DH3P7noan1eeMXc-Vh5M8&s=FoIMlcjDO2b7Gut9XRx-NIBWiBQWItsj8IlylJC7Wkc&e=

>>>>>
>>>>>
>>>>> Tomoko
>>>>>
>>>>> 2019年6月13日(木) 6:16 <baris.kazar@oracle.com>:
>>>>>> Ok, i think only this very specific only "mains" has an issue.
>>>>>>
>>>>>> all i knew about Lucene was fine :) Great...
>>>>>>
>>>>>> i have one more question:
>>>>>>
>>>>>> which one is advised to use: FuzzyQuery or the Query.parser with

>>>>>> search string~ appended?
>>>>>>
>>>>>> The second one will go through analyzer and make search string 
>>>>>> lowercase.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/12/19 1:03 PM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>> Hi again,-
>>>>>>
>>>>>> this is really interesting and i hope i am missing something. 
>>>>>> Index small cases all entries so case sensitivity is not an issue

>>>>>> i think.
>>>>>>
>>>>>> Case #1:
>>>>>>
>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new 
>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, 
>>>>>> phraseAnalyzer) ;
>>>>>>          Query q1 = null;
>>>>>>          try {
>>>>>>              q1 = parser.parse("Main");
>>>>>>          } catch (ParseException e) {
>>>>>>              e.printStackTrace();
>>>>>>          }
>>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>
>>>>>>
>>>>>> This brings with this:
>>>>>>
>>>>>> query plan:
>>>>>>
>>>>>> [+contentDFLT:main, +contentDFLT:"nashua", 
>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>
>>>>>> testQuerySearch1 Time to compute: 0 seconds (copied answer after

>>>>>> exec finished)
>>>>>>
>>>>>> Number of results: 12
>>>>>> Name: Main Dunstable Rd
>>>>>> Score: 41.204945
>>>>>> ID: 12677400
>>>>>> Country Code: US
>>>>>> Coordinates: 42.72631, -71.50269
>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>>> UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.204945
>>>>>> ID: 12681980
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76416, -71.46681
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.204945
>>>>>> ID: 12681973
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75045, -71.4607
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.204945
>>>>>> ID: 12681974
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76019, -71.465
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main Dunstable Rd
>>>>>> Score: 41.204945
>>>>>> ID: 12677399
>>>>>> Country Code: US
>>>>>> Coordinates: 42.74641, -71.48943
>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>>> UNITED STATES
>>>>>>
>>>>>> Name: S Main St
>>>>>> Score: 41.204945
>>>>>> ID: 11893215
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73412, -71.44797
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.204945
>>>>>> ID: 12681978
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73492, -71.44951
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: S Main St
>>>>>> Score: 41.204945
>>>>>> ID: 11893214
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73958, -71.45895
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.204945
>>>>>> ID: 12681979
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76416, -71.46681
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.204945
>>>>>> ID: 12681977
>>>>>> Country Code: US
>>>>>> Coordinates: 42.747, -71.45957
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>>
>>>>>>
>>>>>> Case #2
>>>>>>
>>>>>> When i did this it also worked by adding ~ to make it Fuzzy query

>>>>>> to Main word:
>>>>>>
>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new 
>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, 
>>>>>> phraseAnalyzer) ;
>>>>>>          Query q1 = null;
>>>>>>          try {
>>>>>>              q1 = parser.parse("Main~");
>>>>>>          } catch (ParseException e) {
>>>>>>              e.printStackTrace();
>>>>>>          }
>>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>
>>>>>>
>>>>>> query plan:
>>>>>>
>>>>>> [+contentDFLT:main~2, +contentDFLT:"nashua", 
>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>
>>>>>> testQuerySearch1 Time to compute: 24 seconds (due to debugging 
>>>>>> stops)
>>>>>> Number of results: 12
>>>>>> Name: Main Dunstable Rd
>>>>>> Score: 41.06405
>>>>>> ID: 12677400
>>>>>> Country Code: US
>>>>>> Coordinates: 42.72631, -71.50269
>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>>> UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.06405
>>>>>> ID: 12681980
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76416, -71.46681
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.06405
>>>>>> ID: 12681973
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75045, -71.4607
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.06405
>>>>>> ID: 12681974
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76019, -71.465
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main Dunstable Rd
>>>>>> Score: 41.06405
>>>>>> ID: 12677399
>>>>>> Country Code: US
>>>>>> Coordinates: 42.74641, -71.48943
>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>>> UNITED STATES
>>>>>>
>>>>>> Name: S Main St
>>>>>> Score: 41.06405
>>>>>> ID: 11893215
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73412, -71.44797
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.06405
>>>>>> ID: 12681978
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73492, -71.44951
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: S Main St
>>>>>> Score: 41.06405
>>>>>> ID: 11893214
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73958, -71.45895
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.06405
>>>>>> ID: 12681979
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76416, -71.46681
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 41.06405
>>>>>> ID: 12681977
>>>>>> Country Code: US
>>>>>> Coordinates: 42.747, -71.45957
>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Case #3
>>>>>>
>>>>>> But why does this not work with fuzzy mode and i misspelled a bit

>>>>>> (1 edit away) and as You saw the data is there with Main spelling:
>>>>>>
>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new 
>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, 
>>>>>> phraseAnalyzer) ;
>>>>>>
>>>>>>          Query q1 = null;
>>>>>>          try {
>>>>>>              q1 = parser.parse("Mains~");  // 1 edit
away
>>>>>>          } catch (ParseException e) {
>>>>>>              e.printStackTrace();
>>>>>>          }
>>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>
>>>>>> query plan:
>>>>>>
>>>>>> [+contentDFLT:mains~2, +contentDFLT:"nashua", 
>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>
>>>>>> testQuerySearch1 Time to compute: 23 seconds (due to debugging 
>>>>>> stops)
>>>>>>
>>>>>> Number of results: 0
>>>>>>
>>>>>>
>>>>>>
>>>>>> Case #4
>>>>>>
>>>>>> Then i changed q1 to SHOULD from MUST above: and i think fuzzy 
>>>>>> query is ignored here since there is no MAIN in the first 468 
>>>>>> resuls:
>>>>>>
>>>>>> there is no boost for Mains term here.
>>>>>>
>>>>>> query plan:
>>>>>>
>>>>>> [contentDFLT:mains~2, +contentDFLT:"nashua", 
>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>
>>>>>> testQuerySearch1 Time to compute: 125 seconds (due to debugging 
>>>>>> stops)
>>>>>> Number of results: 1794
>>>>>> Name: Nashua Dr
>>>>>> Score: 34.186226
>>>>>> ID: 4974936
>>>>>> Country Code: US
>>>>>> Coordinates: 42.7636, -71.46063
>>>>>> Search Key: NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Nashua River Rail Trl
>>>>>> Score: 34.186226
>>>>>> ID: 4975508
>>>>>> Country Code: US
>>>>>> Coordinates: 42.7062, -71.53962
>>>>>> Search Key: NASHUA RIVER RAIL NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>>> UNITED STATES
>>>>>>
>>>>>> Name: Nashua Rd
>>>>>> Score: 33.84896
>>>>>> ID: 4975388
>>>>>> Country Code: US
>>>>>> Coordinates: 42.78746, -71.92823
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: NASHUA
>>>>>> Score: 33.84896
>>>>>> ID: 21014865
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75873, -71.46438
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: NASHUA
>>>>>> Score: 33.84896
>>>>>> ID: 21014865
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75873, -71.46438
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: NASHUA
>>>>>> Score: 33.84896
>>>>>> ID: 21014865
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75873, -71.46438
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: NASHUA
>>>>>> Score: 33.84896
>>>>>> ID: 21014865
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75873, -71.46438
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: NASHUA
>>>>>> Score: 33.84896
>>>>>> ID: 21014865
>>>>>> Country Code: US
>>>>>> Coordinates: 42.75873, -71.46438
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Nashua St
>>>>>> Score: 33.84896
>>>>>> ID: 4975671
>>>>>> Country Code: US
>>>>>> Coordinates: 42.88471, -70.81687
>>>>>> Search Key: NASHUA ROCKINGHAM NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>> Name: Nashua Rd
>>>>>> Score: 33.84896
>>>>>> ID: 4975400
>>>>>> Country Code: US
>>>>>> Coordinates: 42.79014, -71.92364
>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>
>>>>>>
>>>>>> Why is the fuzzy query ignored?
>>>>>> Even if i have separate fields for street, city,region, country,

>>>>>> this fuzzy query issue will come into place for words with 
>>>>>> multiple parts like main dunstable etc., right?
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> On 6/12/19 11:36 AM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>> Tomoko,-
>>>>>>
>>>>>>   Thank You for Your suggestions. i am trying to understand it 
>>>>>> and i thought i did :)
>>>>>>
>>>>>> but it does not work with FuzzyQuery when i used with a *single*

>>>>>> large TextField like street=...value... city=...value... 
>>>>>> region=...value... country=...value... (with or without quotes 
>>>>>> for the values)
>>>>>>
>>>>>> What i knew about Lucene fuzzy queries are not holding now with 
>>>>>> this Textfield form. That is why i suspected of a bug.
>>>>>>
>>>>>> 1. Yes, i saw and have a solid proof on that now.
>>>>>>
>>>>>> 2. yes but FuzzyQuery takes quotes as they are as they are 
>>>>>> escaped and it is not analyzed.
>>>>>>
>>>>>> Stuffing into one textfield vs having separate fields should only

>>>>>> affect probably the performance but not the outcome in my case.
>>>>>> But, i have been thinking about this and maybe it is the way to 
>>>>>> go in this case.
>>>>>>
>>>>>> mY CONTENT field has street names in mixed case and city, region

>>>>>> country names in UPPERCASE. Can this be a problem?
>>>>>> i thought index stored them in lowercase since i am using 
>>>>>> StandardAnalyzer.
>>>>>>
>>>>>> CONTENT field also has full textfield string with street=... 
>>>>>> city=... region=... country=... (here all values are UPPERCASE).
>>>>>>
>>>>>> Why cant the index find the names via FuzzyQuery? i tried both 
>>>>>> FuzzyQuery and Query builder as i showed before.
>>>>>>
>>>>>> The last advice in Your previous email would nicely go outside 
>>>>>> the parantheses since it might be very critical :) :) :)
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/12/19 12:17 AM, Tomoko Uchida wrote:
>>>>>>
>>>>>> I'd suggest to correctly understand the way a software works before
>>>>>> suspecting its bug :-)
>>>>>>
>>>>>> I guess you may miss two points:
>>>>>>
>>>>>> 1. the standard analyzer (standard tokenizer) breaks words by double
>>>>>> quote (U+0022) so quotes are not indexed or searched at all if 
>>>>>> you are
>>>>>> using standard analyzer. (That is the reason you have same results
>>>>>> with or without quotes.)
>>>>>> See: 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
>>>>>> and 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=
>>>>>>
>>>>>> 2. double quote has special meaning (it's interpreted as phrase 
>>>>>> query)
>>>>>> with the built-in query parser so you need to escape it if you 
>>>>>> want to
>>>>>> search double quotes itself.
>>>>>> See: 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=
>>>>>>
>>>>>> (My advice would be to create separate fields for each key value

>>>>>> pairs
>>>>>> instead of stuffing all pairs into one text field, if you need to
>>>>>> search them separately.)
>>>>>>
>>>>>> 2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
>>>>>>
>>>>>> i can say that quotes is not the issue with index as it still 
>>>>>> results in
>>>>>> same results with quotes or without quotes.
>>>>>>
>>>>>> i am starting to feel that this might be a bug maybe??
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>> Somehow " is causing an issue as this should return street with 
>>>>>> MAIN:
>>>>>>
>>>>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
>>>>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united
>>>>>> states"] -> this was with fuzzyquery on MAINS
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>>>> +contentDFLT:"country united states", contentDFLT:street
>>>>>> contentDFLT:mains]
>>>>>>
>>>>>> QueeryParser chops it into two pieces from
>>>>>> parser.parser("street=\"MAINS\"");
>>>>>>
>>>>>> Index has a TextField named contentDFLT the following data :
>>>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
>>>>>> HAMPSHIRE" country="UNITED STATES"
>>>>>>
>>>>>>
>>>>>> When i set street=\"MAINS~\" with parser:
>>>>>> i get the following
>>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>>>> +contentDFLT:"country united states", contentDFLT:street
>>>>>> contentDFLT:mains]
>>>>>>
>>>>>> probably " quotations are messing this up as You were saying...
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
>>>>>>
>>>>>> Or, " (double quotation) in your query string may affect query 
>>>>>> parsing.
>>>>>>
>>>>>> When I parse this string by classic query parser (lucene 8.1),
>>>>>> street="MAINS~"
>>>>>> parsed (raw) query is
>>>>>> text:street text:mains
>>>>>> (I set the default search field to "text", so text:xxxx is appeared
>>>>>> here.)
>>>>>>
>>>>>> Query parsing is a complex process, so it would be good to check
>>>>>> parsed raw query string especially when you have (reserved) special
>>>>>> characters in your query...
>>>>>>
>>>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I noticed one small thing in your previous mail.
>>>>>>
>>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
>>>>>>
>>>>>> which is good.
>>>>>>
>>>>>> To specify a search field, ":" (colon) should be used instead of

>>>>>> "=".
>>>>>> See the query parser documentation:
>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=

>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not sure this is related to your problem.
>>>>>>
>>>>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
>>>>>>
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
>>>>>>
>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>>> phraseAnalyzer) ;
>>>>>>             Query q1 = null;
>>>>>>             try {
>>>>>>                 q1 = parser.parse("MAIN");
>>>>>>             } catch (ParseException e) {
>>>>>>
>>>>>>                 e.printStackTrace();
>>>>>>             }
>>>>>>             booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
>>>>>>
>>>>>> testQuerySearch2 Time to compute: 0 seconds
>>>>>> Number of results: 1775
>>>>>> Name: Main St
>>>>>> Score: 37.20959
>>>>>> ID: 12681979
>>>>>> Country Code: US
>>>>>> Coordinates: 42.76416, -71.46681
>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 37.20959
>>>>>> ID: 12681977
>>>>>> Country Code: US
>>>>>> Coordinates: 42.747, -71.45957
>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>
>>>>>> Name: Main St
>>>>>> Score: 37.20959
>>>>>> ID: 12681978
>>>>>> Country Code: US
>>>>>> Coordinates: 42.73492, -71.44951
>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>
>>>>>>      when i use q1 = parser.parse("street=\"MAIN\""); i get same
>>>>>> results
>>>>>> which is good.
>>>>>>
>>>>>> But when i switch to MAINS~ then fuzzy query does not work.
>>>>>>
>>>>>>
>>>>>> i need to say something with the q1 only in the booleanquery:
>>>>>> it tries to match the MAIN in street, city, region and country
>>>>>> which are
>>>>>> in a single TextField field.
>>>>>> But i dont want this. that is why i need to street="..." etc when
>>>>>> searching.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> just for the basic verification, can you find the document without
>>>>>> fuzzy query? I mean, does this query work for you?
>>>>>>
>>>>>> Query query = parser.parse("MAIN");
>>>>>>
>>>>>> Tomoko
>>>>>>
>>>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
>>>>>>
>>>>>> why cant the second set not work at all?
>>>>>>
>>>>>> it is indexed as Textfield like street="..." city="..." etc.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>> i dont know how to use Fuzzyquery with queryparser but probably
>>>>>> You
>>>>>> are suggesting
>>>>>>
>>>>>> QueryParser parser = new QueryParser(field, analyzer) ;
>>>>>> Query query = parser.parse("MAINS~2");
>>>>>>
>>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
>>>>>>
>>>>>> am i right?
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
>>>>>>
>>>>>> I would suggest using a QueryParser for your fuzzy query before
>>>>>> adding it to the Boolean query. This should weed out any case
>>>>>> issues.
>>>>>>
>>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
>>>>>> <mailto:baris.kazar@oracle.com>> wrote:
>>>>>>
>>>>>>         BooleanQuery.Builder booleanQuery = new
>>>>>> BooleanQuery.Builder();
>>>>>>
>>>>>>         //First set
>>>>>>
>>>>>>                 booleanQuery.add(new FuzzyQuery(new
>>>>>>         org.apache.lucene.index.Term(field, "MAINS")),
>>>>>>         BooleanClause.Occur.SHOULD);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>         "NASHUA"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>         "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>         "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>
>>>>>>         // Second set
>>>>>>                  //booleanQuery.add(new FuzzyQuery(new
>>>>>>         org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
>>>>>>         BooleanClause.Occur.SHOULD);
>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>
>>>>>>         field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>
>>>>>>         field, "region=\"NEW HAMPSHIRE\""),
>>>>>> BooleanClause.Occur.MUST);
>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>
>>>>>>         field, "country=\"UNITED STATES\""),
>>>>>> BooleanClause.Occur.MUST);
>>>>>>
>>>>>>         The first set brings also street with Nashua name.
>>>>>> (NASHUA).
>>>>>>
>>>>>>         so, to prevent that and since i also indexed with
>>>>>> street="..."
>>>>>>         city="..." i did the second set but it does not bring
>>>>>> anything.
>>>>>>
>>>>>>         createPhraseQuery builds a Phrasequery with one term
>>>>>> equal to the
>>>>>>         string
>>>>>>         in the call.
>>>>>>
>>>>>>         Best regards
>>>>>>
>>>>>>
>>>>>>
>>>>>>         On 6/10/19 10:47 AM, baris.kazar@oracle.com
>>>>>>         <mailto:baris.kazar@oracle.com> wrote:
>>>>>>         > How do i check how it is indexed? lowecase or
uppercase?
>>>>>>         >
>>>>>>         > only way is now to by testing.
>>>>>>         >
>>>>>>         > i am using standardanalyzer.
>>>>>>         >
>>>>>>         > Best regards
>>>>>>         >
>>>>>>         >
>>>>>>         > On 6/9/19 11:57 AM, Atri Sharma wrote:
>>>>>>         >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
>>>>>>         >> <tomoko.uchida.1111@gmail.com
>>>>>> <mailto:tomoko.uchida.1111@gmail.com>> wrote:
>>>>>>         >>> Hi,
>>>>>>         >>>
>>>>>>         >>> What analyzer do you use for the text
field? Is the
>>>>>> term "Main"
>>>>>>         >>> correctly indexed?
>>>>>>         >> Agreed. Also, it would be good if you could
post your
>>>>>> actual
>>>>>> code.
>>>>>>         >>
>>>>>>         >> What analyzer are you using? If you are using
>>>>>> StandardAnalyzer,
>>>>>>         then
>>>>>>         >> all of your terms while indexing will be
lowercased,
>>>>>> AFAIK, but
>>>>>>         your
>>>>>>         >> query will not be analyzed until you run
a
>>>>>> QueryParser on it.
>>>>>>         >>
>>>>>>         >>
>>>>>>         >> Atri
>>>>>>         >>
>>>>>>         >
>>>>>>         >
>>>>>>         >
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>>
>>>>>>         > To unsubscribe, e-mail:
>>>>>> java-user-unsubscribe@lucene.apache.org
>>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
>>>>>>         > For additional commands, e-mail:
>>>>>>         java-user-help@lucene.apache.org
>>>>>> <mailto:java-user-help@lucene.apache.org>
>>>>>>         >
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message