lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: FuzzyQuery- why is it ignored?
Date Thu, 13 Jun 2019 14:31:27 GMT
Erick,

Cool, could You give a simple example with my example please?

Best regards



On 6/13/19 10:12 AM, Erick Erickson wrote:
> Shot in the dark: stemming. Whenever I see a problem with something ending in “s”
(or “er” or “ing” or….) my first suspect is that stemming is turned on. In that
case the token in the index that’s actually searched on is somewhat different than you expect.
>
> The test is easy, just insure your fieldType contains no stemmers. PorterStemmer is particularly
aggressive, but for this case to test I’d just remove all stemming, re-index and see if
the results differ.
>
> Best,
> Erick
>
>> On Jun 13, 2019, at 7:26 AM, baris.kazar@oracle.com wrote:
>>
>> Tomoko,-
>>
>>   That is strange indeed.
>>
>> Something is wrong when i use mains but maink, mainl, mainr,mainq, maint all work
ok any consonant at the end except s works in this case.
>>
>> Case #3 had +contentDFLT:mains~2 but not +contentDFLT:"mains~2".
>>
>> i am using fuzzy query with ~ from Query.builder and that is not PhraseQuery.
>>
>> Similarly FuzzyQuery with input "mains" (it has to be lowercase since it does not
go through StandardAnalyzer) is also not PhraseQuery.
>>
>> can there be a clearer sample case for ComplexPhraseQuery please in the docs?
>>
>> did You also index "MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES" the expected
output in this case?
>>
>> Thanks for spending time on this, i would like to thank everyone.
>>
>> Best regards
>>
>>
>> On 6/13/19 12:13 AM, Tomoko Uchida wrote:
>>> Hi,
>>>
>>>> Ok, i think only this very specific only "mains" has an issue.
>>> It looks strange to me. I did some test locally.
>>>
>>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES".
>>>
>>> 2a. This query string (just copied from your Case #3) worked correctly
>>> for me as far as I can see.
>>> +contentDFLT:mains~2 +contentDFLT:"nashua",
>>> +contentDFLT:"new-hampshire", +contentDFLT:"united state"
>>>
>>> 2b. However this query string got no results.
>>> +contentDFLT:"mains~2", +contentDFLT:"nashua",
>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"
>>> It is an expected behaviour because the classic query parser does not
>>> support fuzzy query inside phrase query (as far as I know).
>>>
>>> I suspect you use fuzzy query operator (~) inside phrase query ("), as
>>> the 2b case.
>>>
>>> FYI: there is a special parser for such complex phrase query.
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_complexPhrase_ComplexPhraseQueryParser.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=ZcXpaSlwS5DegX76mHTb_6DH3P7noan1eeMXc-Vh5M8&s=FoIMlcjDO2b7Gut9XRx-NIBWiBQWItsj8IlylJC7Wkc&e=
>>>
>>> Tomoko
>>>
>>> 2019年6月13日(木) 6:16 <baris.kazar@oracle.com>:
>>>> Ok, i think only this very specific only "mains" has an issue.
>>>>
>>>> all i knew about Lucene was fine :) Great...
>>>>
>>>> i have one more question:
>>>>
>>>> which one is advised to use: FuzzyQuery or the Query.parser with search string~
appended?
>>>>
>>>> The second one will go through analyzer and make search string lowercase.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 6/12/19 1:03 PM, baris.kazar@oracle.com wrote:
>>>>
>>>> Hi again,-
>>>>
>>>> this is really interesting and i hope i am missing something. Index small
cases all entries so case sensitivity is not an issue i think.
>>>>
>>>> Case #1:
>>>>
>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new org.apache.lucene.queryparser.classic.QueryParser(field,
phraseAnalyzer) ;
>>>>          Query q1 = null;
>>>>          try {
>>>>              q1 = parser.parse("Main");
>>>>          } catch (ParseException e) {
>>>>              e.printStackTrace();
>>>>          }
>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"NASHUA"), BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"UNITED STATES"), BooleanClause.Occur.MUST);
>>>>
>>>>
>>>> This brings with this:
>>>>
>>>> query plan:
>>>>
>>>> [+contentDFLT:main, +contentDFLT:"nashua", +contentDFLT:"new-hampshire",
+contentDFLT:"united states"]
>>>>
>>>> testQuerySearch1 Time to compute: 0 seconds (copied answer after exec finished)
>>>>
>>>> Number of results: 12
>>>> Name: Main Dunstable Rd
>>>> Score: 41.204945
>>>> ID: 12677400
>>>> Country Code: US
>>>> Coordinates: 42.72631, -71.50269
>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.204945
>>>> ID: 12681980
>>>> Country Code: US
>>>> Coordinates: 42.76416, -71.46681
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.204945
>>>> ID: 12681973
>>>> Country Code: US
>>>> Coordinates: 42.75045, -71.4607
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.204945
>>>> ID: 12681974
>>>> Country Code: US
>>>> Coordinates: 42.76019, -71.465
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main Dunstable Rd
>>>> Score: 41.204945
>>>> ID: 12677399
>>>> Country Code: US
>>>> Coordinates: 42.74641, -71.48943
>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: S Main St
>>>> Score: 41.204945
>>>> ID: 11893215
>>>> Country Code: US
>>>> Coordinates: 42.73412, -71.44797
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.204945
>>>> ID: 12681978
>>>> Country Code: US
>>>> Coordinates: 42.73492, -71.44951
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: S Main St
>>>> Score: 41.204945
>>>> ID: 11893214
>>>> Country Code: US
>>>> Coordinates: 42.73958, -71.45895
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.204945
>>>> ID: 12681979
>>>> Country Code: US
>>>> Coordinates: 42.76416, -71.46681
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.204945
>>>> ID: 12681977
>>>> Country Code: US
>>>> Coordinates: 42.747, -71.45957
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>>
>>>>
>>>> Case #2
>>>>
>>>> When i did this it also worked by adding ~ to make it Fuzzy query to Main
word:
>>>>
>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new org.apache.lucene.queryparser.classic.QueryParser(field,
phraseAnalyzer) ;
>>>>          Query q1 = null;
>>>>          try {
>>>>              q1 = parser.parse("Main~");
>>>>          } catch (ParseException e) {
>>>>              e.printStackTrace();
>>>>          }
>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"NASHUA"), BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"UNITED STATES"), BooleanClause.Occur.MUST);
>>>>
>>>>
>>>> query plan:
>>>>
>>>> [+contentDFLT:main~2, +contentDFLT:"nashua", +contentDFLT:"new-hampshire",
+contentDFLT:"united states"]
>>>>
>>>> testQuerySearch1 Time to compute: 24 seconds (due to debugging stops)
>>>> Number of results: 12
>>>> Name: Main Dunstable Rd
>>>> Score: 41.06405
>>>> ID: 12677400
>>>> Country Code: US
>>>> Coordinates: 42.72631, -71.50269
>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.06405
>>>> ID: 12681980
>>>> Country Code: US
>>>> Coordinates: 42.76416, -71.46681
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.06405
>>>> ID: 12681973
>>>> Country Code: US
>>>> Coordinates: 42.75045, -71.4607
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.06405
>>>> ID: 12681974
>>>> Country Code: US
>>>> Coordinates: 42.76019, -71.465
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main Dunstable Rd
>>>> Score: 41.06405
>>>> ID: 12677399
>>>> Country Code: US
>>>> Coordinates: 42.74641, -71.48943
>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: S Main St
>>>> Score: 41.06405
>>>> ID: 11893215
>>>> Country Code: US
>>>> Coordinates: 42.73412, -71.44797
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.06405
>>>> ID: 12681978
>>>> Country Code: US
>>>> Coordinates: 42.73492, -71.44951
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: S Main St
>>>> Score: 41.06405
>>>> ID: 11893214
>>>> Country Code: US
>>>> Coordinates: 42.73958, -71.45895
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.06405
>>>> ID: 12681979
>>>> Country Code: US
>>>> Coordinates: 42.76416, -71.46681
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Main St
>>>> Score: 41.06405
>>>> ID: 12681977
>>>> Country Code: US
>>>> Coordinates: 42.747, -71.45957
>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>>
>>>>
>>>>
>>>> Case #3
>>>>
>>>> But why does this not work with fuzzy mode and i misspelled a bit (1 edit
away) and as You saw the data is there with Main spelling:
>>>>
>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new org.apache.lucene.queryparser.classic.QueryParser(field,
phraseAnalyzer) ;
>>>>
>>>>          Query q1 = null;
>>>>          try {
>>>>              q1 = parser.parse("Mains~");  // 1 edit away
>>>>          } catch (ParseException e) {
>>>>              e.printStackTrace();
>>>>          }
>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"NASHUA"), BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>          booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"UNITED STATES"), BooleanClause.Occur.MUST);
>>>>
>>>> query plan:
>>>>
>>>> [+contentDFLT:mains~2, +contentDFLT:"nashua", +contentDFLT:"new-hampshire",
+contentDFLT:"united states"]
>>>>
>>>> testQuerySearch1 Time to compute: 23 seconds (due to debugging stops)
>>>>
>>>> Number of results: 0
>>>>
>>>>
>>>>
>>>> Case #4
>>>>
>>>> Then i changed q1 to SHOULD from MUST above: and i think fuzzy query is ignored
here since there is no MAIN in the first 468 resuls:
>>>>
>>>> there is no boost for Mains term here.
>>>>
>>>> query plan:
>>>>
>>>> [contentDFLT:mains~2, +contentDFLT:"nashua", +contentDFLT:"new-hampshire",
+contentDFLT:"united states"]
>>>>
>>>> testQuerySearch1 Time to compute: 125 seconds (due to debugging stops)
>>>> Number of results: 1794
>>>> Name: Nashua Dr
>>>> Score: 34.186226
>>>> ID: 4974936
>>>> Country Code: US
>>>> Coordinates: 42.7636, -71.46063
>>>> Search Key: NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Nashua River Rail Trl
>>>> Score: 34.186226
>>>> ID: 4975508
>>>> Country Code: US
>>>> Coordinates: 42.7062, -71.53962
>>>> Search Key: NASHUA RIVER RAIL NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Nashua Rd
>>>> Score: 33.84896
>>>> ID: 4975388
>>>> Country Code: US
>>>> Coordinates: 42.78746, -71.92823
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: NASHUA
>>>> Score: 33.84896
>>>> ID: 21014865
>>>> Country Code: US
>>>> Coordinates: 42.75873, -71.46438
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: NASHUA
>>>> Score: 33.84896
>>>> ID: 21014865
>>>> Country Code: US
>>>> Coordinates: 42.75873, -71.46438
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: NASHUA
>>>> Score: 33.84896
>>>> ID: 21014865
>>>> Country Code: US
>>>> Coordinates: 42.75873, -71.46438
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: NASHUA
>>>> Score: 33.84896
>>>> ID: 21014865
>>>> Country Code: US
>>>> Coordinates: 42.75873, -71.46438
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: NASHUA
>>>> Score: 33.84896
>>>> ID: 21014865
>>>> Country Code: US
>>>> Coordinates: 42.75873, -71.46438
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Nashua St
>>>> Score: 33.84896
>>>> ID: 4975671
>>>> Country Code: US
>>>> Coordinates: 42.88471, -70.81687
>>>> Search Key: NASHUA ROCKINGHAM NEW HAMPSHIRE UNITED STATES
>>>>
>>>> Name: Nashua Rd
>>>> Score: 33.84896
>>>> ID: 4975400
>>>> Country Code: US
>>>> Coordinates: 42.79014, -71.92364
>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>
>>>>
>>>> Why is the fuzzy query ignored?
>>>> Even if i have separate fields for street, city,region, country, this fuzzy
query issue will come into place for words with multiple parts like main dunstable etc., right?
>>>>
>>>> Best regards
>>>>
>>>> On 6/12/19 11:36 AM, baris.kazar@oracle.com wrote:
>>>>
>>>> Tomoko,-
>>>>
>>>>   Thank You for Your suggestions. i am trying to understand it and i thought
i did :)
>>>>
>>>> but it does not work with FuzzyQuery when i used with a *single* large TextField
like street=...value... city=...value... region=...value... country=...value... (with or without
quotes for the values)
>>>>
>>>> What i knew about Lucene fuzzy queries are not holding now with this Textfield
form. That is why i suspected of a bug.
>>>>
>>>> 1. Yes, i saw and have a solid proof on that now.
>>>>
>>>> 2. yes but FuzzyQuery takes quotes as they are as they are escaped and it
is not analyzed.
>>>>
>>>> Stuffing into one textfield vs having separate fields should only affect
probably the performance but not the outcome in my case.
>>>> But, i have been thinking about this and maybe it is the way to go in this
case.
>>>>
>>>> mY CONTENT field has street names in mixed case and city, region country
names in UPPERCASE. Can this be a problem?
>>>> i thought index stored them in lowercase since i am using StandardAnalyzer.
>>>>
>>>> CONTENT field also has full textfield string with street=... city=... region=...
country=... (here all values are UPPERCASE).
>>>>
>>>> Why cant the index find the names via FuzzyQuery? i tried both FuzzyQuery
and Query builder as i showed before.
>>>>
>>>> The last advice in Your previous email would nicely go outside the parantheses
since it might be very critical :) :) :)
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 6/12/19 12:17 AM, Tomoko Uchida wrote:
>>>>
>>>> I'd suggest to correctly understand the way a software works before
>>>> suspecting its bug :-)
>>>>
>>>> I guess you may miss two points:
>>>>
>>>> 1. the standard analyzer (standard tokenizer) breaks words by double
>>>> quote (U+0022) so quotes are not indexed or searched at all if you are
>>>> using standard analyzer. (That is the reason you have same results
>>>> with or without quotes.)
>>>> See: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
>>>> and https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=
>>>>
>>>> 2. double quote has special meaning (it's interpreted as phrase query)
>>>> with the built-in query parser so you need to escape it if you want to
>>>> search double quotes itself.
>>>> See: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=
>>>>
>>>> (My advice would be to create separate fields for each key value pairs
>>>> instead of stuffing all pairs into one text field, if you need to
>>>> search them separately.)
>>>>
>>>> 2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
>>>>
>>>> i can say that quotes is not the issue with index as it still results in
>>>> same results with quotes or without quotes.
>>>>
>>>> i am starting to feel that this might be a bug maybe??
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
>>>>
>>>> Somehow " is causing an issue as this should return street with MAIN:
>>>>
>>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
>>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united
>>>> states"] -> this was with fuzzyquery on MAINS
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
>>>>
>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>> +contentDFLT:"country united states", contentDFLT:street
>>>> contentDFLT:mains]
>>>>
>>>> QueeryParser chops it into two pieces from
>>>> parser.parser("street=\"MAINS\"");
>>>>
>>>> Index has a TextField named contentDFLT the following data :
>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
>>>> HAMPSHIRE" country="UNITED STATES"
>>>>
>>>>
>>>> When i set street=\"MAINS~\" with parser:
>>>> i get the following
>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>> +contentDFLT:"country united states", contentDFLT:street
>>>> contentDFLT:mains]
>>>>
>>>> probably " quotations are messing this up as You were saying...
>>>> Best regards
>>>>
>>>>
>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
>>>>
>>>> Or, " (double quotation) in your query string may affect query parsing.
>>>>
>>>> When I parse this string by classic query parser (lucene 8.1),
>>>> street="MAINS~"
>>>> parsed (raw) query is
>>>> text:street text:mains
>>>> (I set the default search field to "text", so text:xxxx is appeared
>>>> here.)
>>>>
>>>> Query parsing is a complex process, so it would be good to check
>>>> parsed raw query string especially when you have (reserved) special
>>>> characters in your query...
>>>>
>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>>
>>>> Hi,
>>>>
>>>> I noticed one small thing in your previous mail.
>>>>
>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
>>>>
>>>> which is good.
>>>>
>>>> To specify a search field, ":" (colon) should be used instead of "=".
>>>> See the query parser documentation:
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=
>>>>
>>>>
>>>> I'm not sure this is related to your problem.
>>>>
>>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
>>>>
>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
>>>>
>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>> phraseAnalyzer) ;
>>>>             Query q1 = null;
>>>>             try {
>>>>                 q1 = parser.parse("MAIN");
>>>>             } catch (ParseException e) {
>>>>
>>>>                 e.printStackTrace();
>>>>             }
>>>>             booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
>>>>
>>>> testQuerySearch2 Time to compute: 0 seconds
>>>> Number of results: 1775
>>>> Name: Main St
>>>> Score: 37.20959
>>>> ID: 12681979
>>>> Country Code: US
>>>> Coordinates: 42.76416, -71.46681
>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>
>>>> Name: Main St
>>>> Score: 37.20959
>>>> ID: 12681977
>>>> Country Code: US
>>>> Coordinates: 42.747, -71.45957
>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>
>>>> Name: Main St
>>>> Score: 37.20959
>>>> ID: 12681978
>>>> Country Code: US
>>>> Coordinates: 42.73492, -71.44951
>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>
>>>>      when i use q1 = parser.parse("street=\"MAIN\""); i get same
>>>> results
>>>> which is good.
>>>>
>>>> But when i switch to MAINS~ then fuzzy query does not work.
>>>>
>>>>
>>>> i need to say something with the q1 only in the booleanquery:
>>>> it tries to match the MAIN in street, city, region and country
>>>> which are
>>>> in a single TextField field.
>>>> But i dont want this. that is why i need to street="..." etc when
>>>> searching.
>>>>
>>>> Best regards
>>>>
>>>>
>>>>
>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
>>>>
>>>> Hi,
>>>>
>>>> just for the basic verification, can you find the document without
>>>> fuzzy query? I mean, does this query work for you?
>>>>
>>>> Query query = parser.parse("MAIN");
>>>>
>>>> Tomoko
>>>>
>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
>>>>
>>>> why cant the second set not work at all?
>>>>
>>>> it is indexed as Textfield like street="..." city="..." etc.
>>>>
>>>> Best regards
>>>>
>>>>
>>>>
>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
>>>>
>>>> i dont know how to use Fuzzyquery with queryparser but probably
>>>> You
>>>> are suggesting
>>>>
>>>> QueryParser parser = new QueryParser(field, analyzer) ;
>>>> Query query = parser.parse("MAINS~2");
>>>>
>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
>>>>
>>>> am i right?
>>>> Best regards
>>>>
>>>>
>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
>>>>
>>>> I would suggest using a QueryParser for your fuzzy query before
>>>> adding it to the Boolean query. This should weed out any case
>>>> issues.
>>>>
>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
>>>> <mailto:baris.kazar@oracle.com>> wrote:
>>>>
>>>>         BooleanQuery.Builder booleanQuery = new
>>>> BooleanQuery.Builder();
>>>>
>>>>         //First set
>>>>
>>>>                 booleanQuery.add(new FuzzyQuery(new
>>>>         org.apache.lucene.index.Term(field, "MAINS")),
>>>>         BooleanClause.Occur.SHOULD);
>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>         "NASHUA"), BooleanClause.Occur.MUST);
>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>         "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>         "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>
>>>>         // Second set
>>>>                  //booleanQuery.add(new FuzzyQuery(new
>>>>         org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
>>>>         BooleanClause.Occur.SHOULD);
>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>
>>>>         field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>
>>>>         field, "region=\"NEW HAMPSHIRE\""),
>>>> BooleanClause.Occur.MUST);
>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>
>>>>         field, "country=\"UNITED STATES\""),
>>>> BooleanClause.Occur.MUST);
>>>>
>>>>         The first set brings also street with Nashua name.
>>>> (NASHUA).
>>>>
>>>>         so, to prevent that and since i also indexed with
>>>> street="..."
>>>>         city="..." i did the second set but it does not bring
>>>> anything.
>>>>
>>>>         createPhraseQuery builds a Phrasequery with one term
>>>> equal to the
>>>>         string
>>>>         in the call.
>>>>
>>>>         Best regards
>>>>
>>>>
>>>>
>>>>         On 6/10/19 10:47 AM, baris.kazar@oracle.com
>>>>         <mailto:baris.kazar@oracle.com> wrote:
>>>>         > How do i check how it is indexed? lowecase or uppercase?
>>>>         >
>>>>         > only way is now to by testing.
>>>>         >
>>>>         > i am using standardanalyzer.
>>>>         >
>>>>         > Best regards
>>>>         >
>>>>         >
>>>>         > On 6/9/19 11:57 AM, Atri Sharma wrote:
>>>>         >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
>>>>         >> <tomoko.uchida.1111@gmail.com
>>>> <mailto:tomoko.uchida.1111@gmail.com>> wrote:
>>>>         >>> Hi,
>>>>         >>>
>>>>         >>> What analyzer do you use for the text field? Is the
>>>> term "Main"
>>>>         >>> correctly indexed?
>>>>         >> Agreed. Also, it would be good if you could post your
>>>> actual
>>>> code.
>>>>         >>
>>>>         >> What analyzer are you using? If you are using
>>>> StandardAnalyzer,
>>>>         then
>>>>         >> all of your terms while indexing will be lowercased,
>>>> AFAIK, but
>>>>         your
>>>>         >> query will not be analyzed until you run a
>>>> QueryParser on it.
>>>>         >>
>>>>         >>
>>>>         >> Atri
>>>>         >>
>>>>         >
>>>>         >
>>>>         >
>>>> ---------------------------------------------------------------------
>>>>
>>>>         > To unsubscribe, e-mail:
>>>> java-user-unsubscribe@lucene.apache.org
>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
>>>>         > For additional commands, e-mail:
>>>>         java-user-help@lucene.apache.org
>>>> <mailto:java-user-help@lucene.apache.org>
>>>>         >
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message