lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: FuzzyQuery- why is it ignored?
Date Thu, 13 Jun 2019 14:33:39 GMT
does it consider it as like plural word? :) :) :)
That makes sense.

Best regards


On 6/13/19 10:31 AM, baris.kazar@oracle.com wrote:
> Erick,
>
> Cool, could You give a simple example with my example please?
>
> Best regards
>
>
>
> On 6/13/19 10:12 AM, Erick Erickson wrote:
>> Shot in the dark: stemming. Whenever I see a problem with something 
>> ending in “s” (or “er” or “ing” or….) my first suspect is that 
>> stemming is turned on. In that case the token in the index that’s 
>> actually searched on is somewhat different than you expect.
>>
>> The test is easy, just insure your fieldType contains no stemmers. 
>> PorterStemmer is particularly aggressive, but for this case to test 
>> I’d just remove all stemming, re-index and see if the results differ.
>>
>> Best,
>> Erick
>>
>>> On Jun 13, 2019, at 7:26 AM, baris.kazar@oracle.com wrote:
>>>
>>> Tomoko,-
>>>
>>>   That is strange indeed.
>>>
>>> Something is wrong when i use mains but maink, mainl, mainr,mainq, 
>>> maint all work ok any consonant at the end except s works in this case.
>>>
>>> Case #3 had +contentDFLT:mains~2 but not +contentDFLT:"mains~2".
>>>
>>> i am using fuzzy query with ~ from Query.builder and that is not 
>>> PhraseQuery.
>>>
>>> Similarly FuzzyQuery with input "mains" (it has to be lowercase 
>>> since it does not go through StandardAnalyzer) is also not PhraseQuery.
>>>
>>> can there be a clearer sample case for ComplexPhraseQuery please in 
>>> the docs?
>>>
>>> did You also index "MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED 
>>> STATES" the expected output in this case?
>>>
>>> Thanks for spending time on this, i would like to thank everyone.
>>>
>>> Best regards
>>>
>>>
>>> On 6/13/19 12:13 AM, Tomoko Uchida wrote:
>>>> Hi,
>>>>
>>>>> Ok, i think only this very specific only "mains" has an issue.
>>>> It looks strange to me. I did some test locally.
>>>>
>>>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>> UNITED STATES".
>>>>
>>>> 2a. This query string (just copied from your Case #3) worked correctly
>>>> for me as far as I can see.
>>>> +contentDFLT:mains~2 +contentDFLT:"nashua",
>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united state"
>>>>
>>>> 2b. However this query string got no results.
>>>> +contentDFLT:"mains~2", +contentDFLT:"nashua",
>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"
>>>> It is an expected behaviour because the classic query parser does not
>>>> support fuzzy query inside phrase query (as far as I know).
>>>>
>>>> I suspect you use fuzzy query operator (~) inside phrase query ("), as
>>>> the 2b case.
>>>>
>>>> FYI: there is a special parser for such complex phrase query.
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_complexPhrase_ComplexPhraseQueryParser.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=ZcXpaSlwS5DegX76mHTb_6DH3P7noan1eeMXc-Vh5M8&s=FoIMlcjDO2b7Gut9XRx-NIBWiBQWItsj8IlylJC7Wkc&e=

>>>>
>>>>
>>>> Tomoko
>>>>
>>>> 2019年6月13日(木) 6:16 <baris.kazar@oracle.com>:
>>>>> Ok, i think only this very specific only "mains" has an issue.
>>>>>
>>>>> all i knew about Lucene was fine :) Great...
>>>>>
>>>>> i have one more question:
>>>>>
>>>>> which one is advised to use: FuzzyQuery or the Query.parser with 
>>>>> search string~ appended?
>>>>>
>>>>> The second one will go through analyzer and make search string 
>>>>> lowercase.
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 6/12/19 1:03 PM, baris.kazar@oracle.com wrote:
>>>>>
>>>>> Hi again,-
>>>>>
>>>>> this is really interesting and i hope i am missing something. 
>>>>> Index small cases all entries so case sensitivity is not an issue 
>>>>> i think.
>>>>>
>>>>> Case #1:
>>>>>
>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new 
>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, 
>>>>> phraseAnalyzer) ;
>>>>>          Query q1 = null;
>>>>>          try {
>>>>>              q1 = parser.parse("Main");
>>>>>          } catch (ParseException e) {
>>>>>              e.printStackTrace();
>>>>>          }
>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>
>>>>>
>>>>> This brings with this:
>>>>>
>>>>> query plan:
>>>>>
>>>>> [+contentDFLT:main, +contentDFLT:"nashua", 
>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>
>>>>> testQuerySearch1 Time to compute: 0 seconds (copied answer after 
>>>>> exec finished)
>>>>>
>>>>> Number of results: 12
>>>>> Name: Main Dunstable Rd
>>>>> Score: 41.204945
>>>>> ID: 12677400
>>>>> Country Code: US
>>>>> Coordinates: 42.72631, -71.50269
>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>> UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.204945
>>>>> ID: 12681980
>>>>> Country Code: US
>>>>> Coordinates: 42.76416, -71.46681
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.204945
>>>>> ID: 12681973
>>>>> Country Code: US
>>>>> Coordinates: 42.75045, -71.4607
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.204945
>>>>> ID: 12681974
>>>>> Country Code: US
>>>>> Coordinates: 42.76019, -71.465
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main Dunstable Rd
>>>>> Score: 41.204945
>>>>> ID: 12677399
>>>>> Country Code: US
>>>>> Coordinates: 42.74641, -71.48943
>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>> UNITED STATES
>>>>>
>>>>> Name: S Main St
>>>>> Score: 41.204945
>>>>> ID: 11893215
>>>>> Country Code: US
>>>>> Coordinates: 42.73412, -71.44797
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.204945
>>>>> ID: 12681978
>>>>> Country Code: US
>>>>> Coordinates: 42.73492, -71.44951
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: S Main St
>>>>> Score: 41.204945
>>>>> ID: 11893214
>>>>> Country Code: US
>>>>> Coordinates: 42.73958, -71.45895
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.204945
>>>>> ID: 12681979
>>>>> Country Code: US
>>>>> Coordinates: 42.76416, -71.46681
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.204945
>>>>> ID: 12681977
>>>>> Country Code: US
>>>>> Coordinates: 42.747, -71.45957
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>>
>>>>>
>>>>> Case #2
>>>>>
>>>>> When i did this it also worked by adding ~ to make it Fuzzy query 
>>>>> to Main word:
>>>>>
>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new 
>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, 
>>>>> phraseAnalyzer) ;
>>>>>          Query q1 = null;
>>>>>          try {
>>>>>              q1 = parser.parse("Main~");
>>>>>          } catch (ParseException e) {
>>>>>              e.printStackTrace();
>>>>>          }
>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>
>>>>>
>>>>> query plan:
>>>>>
>>>>> [+contentDFLT:main~2, +contentDFLT:"nashua", 
>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>
>>>>> testQuerySearch1 Time to compute: 24 seconds (due to debugging stops)
>>>>> Number of results: 12
>>>>> Name: Main Dunstable Rd
>>>>> Score: 41.06405
>>>>> ID: 12677400
>>>>> Country Code: US
>>>>> Coordinates: 42.72631, -71.50269
>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>> UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.06405
>>>>> ID: 12681980
>>>>> Country Code: US
>>>>> Coordinates: 42.76416, -71.46681
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.06405
>>>>> ID: 12681973
>>>>> Country Code: US
>>>>> Coordinates: 42.75045, -71.4607
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.06405
>>>>> ID: 12681974
>>>>> Country Code: US
>>>>> Coordinates: 42.76019, -71.465
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main Dunstable Rd
>>>>> Score: 41.06405
>>>>> ID: 12677399
>>>>> Country Code: US
>>>>> Coordinates: 42.74641, -71.48943
>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>> UNITED STATES
>>>>>
>>>>> Name: S Main St
>>>>> Score: 41.06405
>>>>> ID: 11893215
>>>>> Country Code: US
>>>>> Coordinates: 42.73412, -71.44797
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.06405
>>>>> ID: 12681978
>>>>> Country Code: US
>>>>> Coordinates: 42.73492, -71.44951
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: S Main St
>>>>> Score: 41.06405
>>>>> ID: 11893214
>>>>> Country Code: US
>>>>> Coordinates: 42.73958, -71.45895
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.06405
>>>>> ID: 12681979
>>>>> Country Code: US
>>>>> Coordinates: 42.76416, -71.46681
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Main St
>>>>> Score: 41.06405
>>>>> ID: 12681977
>>>>> Country Code: US
>>>>> Coordinates: 42.747, -71.45957
>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Case #3
>>>>>
>>>>> But why does this not work with fuzzy mode and i misspelled a bit 
>>>>> (1 edit away) and as You saw the data is there with Main spelling:
>>>>>
>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new 
>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, 
>>>>> phraseAnalyzer) ;
>>>>>
>>>>>          Query q1 = null;
>>>>>          try {
>>>>>              q1 = parser.parse("Mains~");  // 1 edit away
>>>>>          } catch (ParseException e) {
>>>>>              e.printStackTrace();
>>>>>          }
>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, 
>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>
>>>>> query plan:
>>>>>
>>>>> [+contentDFLT:mains~2, +contentDFLT:"nashua", 
>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>
>>>>> testQuerySearch1 Time to compute: 23 seconds (due to debugging stops)
>>>>>
>>>>> Number of results: 0
>>>>>
>>>>>
>>>>>
>>>>> Case #4
>>>>>
>>>>> Then i changed q1 to SHOULD from MUST above: and i think fuzzy 
>>>>> query is ignored here since there is no MAIN in the first 468 resuls:
>>>>>
>>>>> there is no boost for Mains term here.
>>>>>
>>>>> query plan:
>>>>>
>>>>> [contentDFLT:mains~2, +contentDFLT:"nashua", 
>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>
>>>>> testQuerySearch1 Time to compute: 125 seconds (due to debugging 
>>>>> stops)
>>>>> Number of results: 1794
>>>>> Name: Nashua Dr
>>>>> Score: 34.186226
>>>>> ID: 4974936
>>>>> Country Code: US
>>>>> Coordinates: 42.7636, -71.46063
>>>>> Search Key: NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Nashua River Rail Trl
>>>>> Score: 34.186226
>>>>> ID: 4975508
>>>>> Country Code: US
>>>>> Coordinates: 42.7062, -71.53962
>>>>> Search Key: NASHUA RIVER RAIL NASHUA HILLSBOROUGH NEW HAMPSHIRE 
>>>>> UNITED STATES
>>>>>
>>>>> Name: Nashua Rd
>>>>> Score: 33.84896
>>>>> ID: 4975388
>>>>> Country Code: US
>>>>> Coordinates: 42.78746, -71.92823
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: NASHUA
>>>>> Score: 33.84896
>>>>> ID: 21014865
>>>>> Country Code: US
>>>>> Coordinates: 42.75873, -71.46438
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: NASHUA
>>>>> Score: 33.84896
>>>>> ID: 21014865
>>>>> Country Code: US
>>>>> Coordinates: 42.75873, -71.46438
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: NASHUA
>>>>> Score: 33.84896
>>>>> ID: 21014865
>>>>> Country Code: US
>>>>> Coordinates: 42.75873, -71.46438
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: NASHUA
>>>>> Score: 33.84896
>>>>> ID: 21014865
>>>>> Country Code: US
>>>>> Coordinates: 42.75873, -71.46438
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: NASHUA
>>>>> Score: 33.84896
>>>>> ID: 21014865
>>>>> Country Code: US
>>>>> Coordinates: 42.75873, -71.46438
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Nashua St
>>>>> Score: 33.84896
>>>>> ID: 4975671
>>>>> Country Code: US
>>>>> Coordinates: 42.88471, -70.81687
>>>>> Search Key: NASHUA ROCKINGHAM NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>> Name: Nashua Rd
>>>>> Score: 33.84896
>>>>> ID: 4975400
>>>>> Country Code: US
>>>>> Coordinates: 42.79014, -71.92364
>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>
>>>>>
>>>>> Why is the fuzzy query ignored?
>>>>> Even if i have separate fields for street, city,region, country, 
>>>>> this fuzzy query issue will come into place for words with 
>>>>> multiple parts like main dunstable etc., right?
>>>>>
>>>>> Best regards
>>>>>
>>>>> On 6/12/19 11:36 AM, baris.kazar@oracle.com wrote:
>>>>>
>>>>> Tomoko,-
>>>>>
>>>>>   Thank You for Your suggestions. i am trying to understand it and 
>>>>> i thought i did :)
>>>>>
>>>>> but it does not work with FuzzyQuery when i used with a *single* 
>>>>> large TextField like street=...value... city=...value... 
>>>>> region=...value... country=...value... (with or without quotes for 
>>>>> the values)
>>>>>
>>>>> What i knew about Lucene fuzzy queries are not holding now with 
>>>>> this Textfield form. That is why i suspected of a bug.
>>>>>
>>>>> 1. Yes, i saw and have a solid proof on that now.
>>>>>
>>>>> 2. yes but FuzzyQuery takes quotes as they are as they are escaped 
>>>>> and it is not analyzed.
>>>>>
>>>>> Stuffing into one textfield vs having separate fields should only 
>>>>> affect probably the performance but not the outcome in my case.
>>>>> But, i have been thinking about this and maybe it is the way to go 
>>>>> in this case.
>>>>>
>>>>> mY CONTENT field has street names in mixed case and city, region 
>>>>> country names in UPPERCASE. Can this be a problem?
>>>>> i thought index stored them in lowercase since i am using 
>>>>> StandardAnalyzer.
>>>>>
>>>>> CONTENT field also has full textfield string with street=... 
>>>>> city=... region=... country=... (here all values are UPPERCASE).
>>>>>
>>>>> Why cant the index find the names via FuzzyQuery? i tried both 
>>>>> FuzzyQuery and Query builder as i showed before.
>>>>>
>>>>> The last advice in Your previous email would nicely go outside the 
>>>>> parantheses since it might be very critical :) :) :)
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 6/12/19 12:17 AM, Tomoko Uchida wrote:
>>>>>
>>>>> I'd suggest to correctly understand the way a software works before
>>>>> suspecting its bug :-)
>>>>>
>>>>> I guess you may miss two points:
>>>>>
>>>>> 1. the standard analyzer (standard tokenizer) breaks words by double
>>>>> quote (U+0022) so quotes are not indexed or searched at all if you 
>>>>> are
>>>>> using standard analyzer. (That is the reason you have same results
>>>>> with or without quotes.)
>>>>> See: 
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
>>>>> and 
>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=
>>>>>
>>>>> 2. double quote has special meaning (it's interpreted as phrase 
>>>>> query)
>>>>> with the built-in query parser so you need to escape it if you 
>>>>> want to
>>>>> search double quotes itself.
>>>>> See: 
>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=
>>>>>
>>>>> (My advice would be to create separate fields for each key value 
>>>>> pairs
>>>>> instead of stuffing all pairs into one text field, if you need to
>>>>> search them separately.)
>>>>>
>>>>> 2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
>>>>>
>>>>> i can say that quotes is not the issue with index as it still 
>>>>> results in
>>>>> same results with quotes or without quotes.
>>>>>
>>>>> i am starting to feel that this might be a bug maybe??
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
>>>>>
>>>>> Somehow " is causing an issue as this should return street with MAIN:
>>>>>
>>>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
>>>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united
>>>>> states"] -> this was with fuzzyquery on MAINS
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
>>>>>
>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>>> +contentDFLT:"country united states", contentDFLT:street
>>>>> contentDFLT:mains]
>>>>>
>>>>> QueeryParser chops it into two pieces from
>>>>> parser.parser("street=\"MAINS\"");
>>>>>
>>>>> Index has a TextField named contentDFLT the following data :
>>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
>>>>> HAMPSHIRE" country="UNITED STATES"
>>>>>
>>>>>
>>>>> When i set street=\"MAINS~\" with parser:
>>>>> i get the following
>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>>> +contentDFLT:"country united states", contentDFLT:street
>>>>> contentDFLT:mains]
>>>>>
>>>>> probably " quotations are messing this up as You were saying...
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
>>>>>
>>>>> Or, " (double quotation) in your query string may affect query 
>>>>> parsing.
>>>>>
>>>>> When I parse this string by classic query parser (lucene 8.1),
>>>>> street="MAINS~"
>>>>> parsed (raw) query is
>>>>> text:street text:mains
>>>>> (I set the default search field to "text", so text:xxxx is appeared
>>>>> here.)
>>>>>
>>>>> Query parsing is a complex process, so it would be good to check
>>>>> parsed raw query string especially when you have (reserved) special
>>>>> characters in your query...
>>>>>
>>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed one small thing in your previous mail.
>>>>>
>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
>>>>>
>>>>> which is good.
>>>>>
>>>>> To specify a search field, ":" (colon) should be used instead of "=".
>>>>> See the query parser documentation:
>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=

>>>>>
>>>>>
>>>>>
>>>>> I'm not sure this is related to your problem.
>>>>>
>>>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
>>>>>
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
>>>>>
>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>> phraseAnalyzer) ;
>>>>>             Query q1 = null;
>>>>>             try {
>>>>>                 q1 = parser.parse("MAIN");
>>>>>             } catch (ParseException e) {
>>>>>
>>>>>                 e.printStackTrace();
>>>>>             }
>>>>>             booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
>>>>>
>>>>> testQuerySearch2 Time to compute: 0 seconds
>>>>> Number of results: 1775
>>>>> Name: Main St
>>>>> Score: 37.20959
>>>>> ID: 12681979
>>>>> Country Code: US
>>>>> Coordinates: 42.76416, -71.46681
>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>
>>>>> Name: Main St
>>>>> Score: 37.20959
>>>>> ID: 12681977
>>>>> Country Code: US
>>>>> Coordinates: 42.747, -71.45957
>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>
>>>>> Name: Main St
>>>>> Score: 37.20959
>>>>> ID: 12681978
>>>>> Country Code: US
>>>>> Coordinates: 42.73492, -71.44951
>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>
>>>>>      when i use q1 = parser.parse("street=\"MAIN\""); i get same
>>>>> results
>>>>> which is good.
>>>>>
>>>>> But when i switch to MAINS~ then fuzzy query does not work.
>>>>>
>>>>>
>>>>> i need to say something with the q1 only in the booleanquery:
>>>>> it tries to match the MAIN in street, city, region and country
>>>>> which are
>>>>> in a single TextField field.
>>>>> But i dont want this. that is why i need to street="..." etc when
>>>>> searching.
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>>
>>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> just for the basic verification, can you find the document without
>>>>> fuzzy query? I mean, does this query work for you?
>>>>>
>>>>> Query query = parser.parse("MAIN");
>>>>>
>>>>> Tomoko
>>>>>
>>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
>>>>>
>>>>> why cant the second set not work at all?
>>>>>
>>>>> it is indexed as Textfield like street="..." city="..." etc.
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>>
>>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
>>>>>
>>>>> i dont know how to use Fuzzyquery with queryparser but probably
>>>>> You
>>>>> are suggesting
>>>>>
>>>>> QueryParser parser = new QueryParser(field, analyzer) ;
>>>>> Query query = parser.parse("MAINS~2");
>>>>>
>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
>>>>>
>>>>> am i right?
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
>>>>>
>>>>> I would suggest using a QueryParser for your fuzzy query before
>>>>> adding it to the Boolean query. This should weed out any case
>>>>> issues.
>>>>>
>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
>>>>> <mailto:baris.kazar@oracle.com>> wrote:
>>>>>
>>>>>         BooleanQuery.Builder booleanQuery = new
>>>>> BooleanQuery.Builder();
>>>>>
>>>>>         //First set
>>>>>
>>>>>                 booleanQuery.add(new FuzzyQuery(new
>>>>>         org.apache.lucene.index.Term(field, "MAINS")),
>>>>>         BooleanClause.Occur.SHOULD);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>         "NASHUA"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>         "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>         "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>
>>>>>         // Second set
>>>>>                  //booleanQuery.add(new FuzzyQuery(new
>>>>>         org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
>>>>>         BooleanClause.Occur.SHOULD);
>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>
>>>>>         field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>
>>>>>         field, "region=\"NEW HAMPSHIRE\""),
>>>>> BooleanClause.Occur.MUST);
>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>
>>>>>         field, "country=\"UNITED STATES\""),
>>>>> BooleanClause.Occur.MUST);
>>>>>
>>>>>         The first set brings also street with Nashua name.
>>>>> (NASHUA).
>>>>>
>>>>>         so, to prevent that and since i also indexed with
>>>>> street="..."
>>>>>         city="..." i did the second set but it does not bring
>>>>> anything.
>>>>>
>>>>>         createPhraseQuery builds a Phrasequery with one term
>>>>> equal to the
>>>>>         string
>>>>>         in the call.
>>>>>
>>>>>         Best regards
>>>>>
>>>>>
>>>>>
>>>>>         On 6/10/19 10:47 AM, baris.kazar@oracle.com
>>>>>         <mailto:baris.kazar@oracle.com> wrote:
>>>>>         > How do i check how it is indexed? lowecase or uppercase?
>>>>>         >
>>>>>         > only way is now to by testing.
>>>>>         >
>>>>>         > i am using standardanalyzer.
>>>>>         >
>>>>>         > Best regards
>>>>>         >
>>>>>         >
>>>>>         > On 6/9/19 11:57 AM, Atri Sharma wrote:
>>>>>         >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
>>>>>         >> <tomoko.uchida.1111@gmail.com
>>>>> <mailto:tomoko.uchida.1111@gmail.com>> wrote:
>>>>>         >>> Hi,
>>>>>         >>>
>>>>>         >>> What analyzer do you use for the text field?
Is the
>>>>> term "Main"
>>>>>         >>> correctly indexed?
>>>>>         >> Agreed. Also, it would be good if you could post
your
>>>>> actual
>>>>> code.
>>>>>         >>
>>>>>         >> What analyzer are you using? If you are using
>>>>> StandardAnalyzer,
>>>>>         then
>>>>>         >> all of your terms while indexing will be lowercased,
>>>>> AFAIK, but
>>>>>         your
>>>>>         >> query will not be analyzed until you run a
>>>>> QueryParser on it.
>>>>>         >>
>>>>>         >>
>>>>>         >> Atri
>>>>>         >>
>>>>>         >
>>>>>         >
>>>>>         >
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>>         > To unsubscribe, e-mail:
>>>>> java-user-unsubscribe@lucene.apache.org
>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
>>>>>         > For additional commands, e-mail:
>>>>>         java-user-help@lucene.apache.org
>>>>> <mailto:java-user-help@lucene.apache.org>
>>>>>         >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message