lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomoko Uchida <tomoko.uchida.1...@gmail.com>
Subject Re: FuzzyQuery- why is it ignored?
Date Thu, 13 Jun 2019 14:40:23 GMT
Sorry, I made a mistake when copypasting. Let me just correct my previous mail.

> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES".

1. Indexed this text: "MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW
HAMPSHIRE UNITED STATES"

----
As far as I can say, this query correctly find the indexed document
(so I have no idea about what is wrong with fuzzy query).
+contentDFLT:mains~2 +contentDFLT:"nashua"
+contentDFLT:"new-hampshire" +contentDFLT:"united states"

I am
- using lucene 8.1.
- using standard analyzer for both of indexing and searching.
- using classic query parser for parsing.



2019年6月13日(木) 23:18 <baris.kazar@oracle.com>:
>
> However, the index does not have MAINS but MAIN for the expected entry.
>
> Best regards
>
>
>
> On 6/13/19 10:33 AM, baris.kazar@oracle.com wrote:
> > does it consider it as like plural word? :) :) :)
> > That makes sense.
> >
> > Best regards
> >
> >
> > On 6/13/19 10:31 AM, baris.kazar@oracle.com wrote:
> >> Erick,
> >>
> >> Cool, could You give a simple example with my example please?
> >>
> >> Best regards
> >>
> >>
> >>
> >> On 6/13/19 10:12 AM, Erick Erickson wrote:
> >>> Shot in the dark: stemming. Whenever I see a problem with something
> >>> ending in “s” (or “er” or “ing” or….) my first suspect is that
> >>> stemming is turned on. In that case the token in the index that’s
> >>> actually searched on is somewhat different than you expect.
> >>>
> >>> The test is easy, just insure your fieldType contains no stemmers.
> >>> PorterStemmer is particularly aggressive, but for this case to test
> >>> I’d just remove all stemming, re-index and see if the results differ.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>>> On Jun 13, 2019, at 7:26 AM, baris.kazar@oracle.com wrote:
> >>>>
> >>>> Tomoko,-
> >>>>
> >>>>   That is strange indeed.
> >>>>
> >>>> Something is wrong when i use mains but maink, mainl, mainr,mainq,
> >>>> maint all work ok any consonant at the end except s works in this
> >>>> case.
> >>>>
> >>>> Case #3 had +contentDFLT:mains~2 but not +contentDFLT:"mains~2".
> >>>>
> >>>> i am using fuzzy query with ~ from Query.builder and that is not
> >>>> PhraseQuery.
> >>>>
> >>>> Similarly FuzzyQuery with input "mains" (it has to be lowercase
> >>>> since it does not go through StandardAnalyzer) is also not
> >>>> PhraseQuery.
> >>>>
> >>>> can there be a clearer sample case for ComplexPhraseQuery please in
> >>>> the docs?
> >>>>
> >>>> did You also index "MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED
> >>>> STATES" the expected output in this case?
> >>>>
> >>>> Thanks for spending time on this, i would like to thank everyone.
> >>>>
> >>>> Best regards
> >>>>
> >>>>
> >>>> On 6/13/19 12:13 AM, Tomoko Uchida wrote:
> >>>>> Hi,
> >>>>>
> >>>>>> Ok, i think only this very specific only "mains" has an issue.
> >>>>> It looks strange to me. I did some test locally.
> >>>>>
> >>>>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE
> >>>>> UNITED STATES".
> >>>>>
> >>>>> 2a. This query string (just copied from your Case #3) worked
> >>>>> correctly
> >>>>> for me as far as I can see.
> >>>>> +contentDFLT:mains~2 +contentDFLT:"nashua",
> >>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united state"
> >>>>>
> >>>>> 2b. However this query string got no results.
> >>>>> +contentDFLT:"mains~2", +contentDFLT:"nashua",
> >>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"
> >>>>> It is an expected behaviour because the classic query parser does not
> >>>>> support fuzzy query inside phrase query (as far as I know).
> >>>>>
> >>>>> I suspect you use fuzzy query operator (~) inside phrase query
> >>>>> ("), as
> >>>>> the 2b case.
> >>>>>
> >>>>> FYI: there is a special parser for such complex phrase query.
> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_complexPhrase_ComplexPhraseQueryParser.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=ZcXpaSlwS5DegX76mHTb_6DH3P7noan1eeMXc-Vh5M8&s=FoIMlcjDO2b7Gut9XRx-NIBWiBQWItsj8IlylJC7Wkc&e=
> >>>>>
> >>>>>
> >>>>> Tomoko
> >>>>>
> >>>>> 2019年6月13日(木) 6:16 <baris.kazar@oracle.com>:
> >>>>>> Ok, i think only this very specific only "mains" has an issue.
> >>>>>>
> >>>>>> all i knew about Lucene was fine :) Great...
> >>>>>>
> >>>>>> i have one more question:
> >>>>>>
> >>>>>> which one is advised to use: FuzzyQuery or the Query.parser with
> >>>>>> search string~ appended?
> >>>>>>
> >>>>>> The second one will go through analyzer and make search string
> >>>>>> lowercase.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 6/12/19 1:03 PM, baris.kazar@oracle.com wrote:
> >>>>>>
> >>>>>> Hi again,-
> >>>>>>
> >>>>>> this is really interesting and i hope i am missing something.
> >>>>>> Index small cases all entries so case sensitivity is not an issue
> >>>>>> i think.
> >>>>>>
> >>>>>> Case #1:
> >>>>>>
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
> >>>>>> phraseAnalyzer) ;
> >>>>>>          Query q1 = null;
> >>>>>>          try {
> >>>>>>              q1 = parser.parse("Main");
> >>>>>>          } catch (ParseException e) {
> >>>>>>              e.printStackTrace();
> >>>>>>          }
> >>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "NASHUA"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
> >>>>>>
> >>>>>>
> >>>>>> This brings with this:
> >>>>>>
> >>>>>> query plan:
> >>>>>>
> >>>>>> [+contentDFLT:main, +contentDFLT:"nashua",
> >>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
> >>>>>>
> >>>>>> testQuerySearch1 Time to compute: 0 seconds (copied answer after
> >>>>>> exec finished)
> >>>>>>
> >>>>>> Number of results: 12
> >>>>>> Name: Main Dunstable Rd
> >>>>>> Score: 41.204945
> >>>>>> ID: 12677400
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.72631, -71.50269
> >>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
> >>>>>> UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 12681980
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76416, -71.46681
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 12681973
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75045, -71.4607
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 12681974
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76019, -71.465
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main Dunstable Rd
> >>>>>> Score: 41.204945
> >>>>>> ID: 12677399
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.74641, -71.48943
> >>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
> >>>>>> UNITED STATES
> >>>>>>
> >>>>>> Name: S Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 11893215
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73412, -71.44797
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 12681978
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73492, -71.44951
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: S Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 11893214
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73958, -71.45895
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 12681979
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76416, -71.46681
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.204945
> >>>>>> ID: 12681977
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.747, -71.45957
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Case #2
> >>>>>>
> >>>>>> When i did this it also worked by adding ~ to make it Fuzzy query
> >>>>>> to Main word:
> >>>>>>
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
> >>>>>> phraseAnalyzer) ;
> >>>>>>          Query q1 = null;
> >>>>>>          try {
> >>>>>>              q1 = parser.parse("Main~");
> >>>>>>          } catch (ParseException e) {
> >>>>>>              e.printStackTrace();
> >>>>>>          }
> >>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "NASHUA"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
> >>>>>>
> >>>>>>
> >>>>>> query plan:
> >>>>>>
> >>>>>> [+contentDFLT:main~2, +contentDFLT:"nashua",
> >>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
> >>>>>>
> >>>>>> testQuerySearch1 Time to compute: 24 seconds (due to debugging
> >>>>>> stops)
> >>>>>> Number of results: 12
> >>>>>> Name: Main Dunstable Rd
> >>>>>> Score: 41.06405
> >>>>>> ID: 12677400
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.72631, -71.50269
> >>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
> >>>>>> UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 12681980
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76416, -71.46681
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 12681973
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75045, -71.4607
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 12681974
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76019, -71.465
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main Dunstable Rd
> >>>>>> Score: 41.06405
> >>>>>> ID: 12677399
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.74641, -71.48943
> >>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
> >>>>>> UNITED STATES
> >>>>>>
> >>>>>> Name: S Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 11893215
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73412, -71.44797
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 12681978
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73492, -71.44951
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: S Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 11893214
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73958, -71.45895
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 12681979
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76416, -71.46681
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 41.06405
> >>>>>> ID: 12681977
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.747, -71.45957
> >>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Case #3
> >>>>>>
> >>>>>> But why does this not work with fuzzy mode and i misspelled a bit
> >>>>>> (1 edit away) and as You saw the data is there with Main spelling:
> >>>>>>
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
> >>>>>> phraseAnalyzer) ;
> >>>>>>
> >>>>>>          Query q1 = null;
> >>>>>>          try {
> >>>>>>              q1 = parser.parse("Mains~");  // 1 edit away
> >>>>>>          } catch (ParseException e) {
> >>>>>>              e.printStackTrace();
> >>>>>>          }
> >>>>>>          booleanQuery.add(q1, BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "NASHUA"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
> >>>>>>
> >>>>>> query plan:
> >>>>>>
> >>>>>> [+contentDFLT:mains~2, +contentDFLT:"nashua",
> >>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
> >>>>>>
> >>>>>> testQuerySearch1 Time to compute: 23 seconds (due to debugging
> >>>>>> stops)
> >>>>>>
> >>>>>> Number of results: 0
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Case #4
> >>>>>>
> >>>>>> Then i changed q1 to SHOULD from MUST above: and i think fuzzy
> >>>>>> query is ignored here since there is no MAIN in the first 468
> >>>>>> resuls:
> >>>>>>
> >>>>>> there is no boost for Mains term here.
> >>>>>>
> >>>>>> query plan:
> >>>>>>
> >>>>>> [contentDFLT:mains~2, +contentDFLT:"nashua",
> >>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
> >>>>>>
> >>>>>> testQuerySearch1 Time to compute: 125 seconds (due to debugging
> >>>>>> stops)
> >>>>>> Number of results: 1794
> >>>>>> Name: Nashua Dr
> >>>>>> Score: 34.186226
> >>>>>> ID: 4974936
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.7636, -71.46063
> >>>>>> Search Key: NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Nashua River Rail Trl
> >>>>>> Score: 34.186226
> >>>>>> ID: 4975508
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.7062, -71.53962
> >>>>>> Search Key: NASHUA RIVER RAIL NASHUA HILLSBOROUGH NEW HAMPSHIRE
> >>>>>> UNITED STATES
> >>>>>>
> >>>>>> Name: Nashua Rd
> >>>>>> Score: 33.84896
> >>>>>> ID: 4975388
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.78746, -71.92823
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: NASHUA
> >>>>>> Score: 33.84896
> >>>>>> ID: 21014865
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75873, -71.46438
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: NASHUA
> >>>>>> Score: 33.84896
> >>>>>> ID: 21014865
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75873, -71.46438
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: NASHUA
> >>>>>> Score: 33.84896
> >>>>>> ID: 21014865
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75873, -71.46438
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: NASHUA
> >>>>>> Score: 33.84896
> >>>>>> ID: 21014865
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75873, -71.46438
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: NASHUA
> >>>>>> Score: 33.84896
> >>>>>> ID: 21014865
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.75873, -71.46438
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Nashua St
> >>>>>> Score: 33.84896
> >>>>>> ID: 4975671
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.88471, -70.81687
> >>>>>> Search Key: NASHUA ROCKINGHAM NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>> Name: Nashua Rd
> >>>>>> Score: 33.84896
> >>>>>> ID: 4975400
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.79014, -71.92364
> >>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
> >>>>>>
> >>>>>>
> >>>>>> Why is the fuzzy query ignored?
> >>>>>> Even if i have separate fields for street, city,region, country,
> >>>>>> this fuzzy query issue will come into place for words with
> >>>>>> multiple parts like main dunstable etc., right?
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>> On 6/12/19 11:36 AM, baris.kazar@oracle.com wrote:
> >>>>>>
> >>>>>> Tomoko,-
> >>>>>>
> >>>>>>   Thank You for Your suggestions. i am trying to understand it
> >>>>>> and i thought i did :)
> >>>>>>
> >>>>>> but it does not work with FuzzyQuery when i used with a *single*
> >>>>>> large TextField like street=...value... city=...value...
> >>>>>> region=...value... country=...value... (with or without quotes
> >>>>>> for the values)
> >>>>>>
> >>>>>> What i knew about Lucene fuzzy queries are not holding now with
> >>>>>> this Textfield form. That is why i suspected of a bug.
> >>>>>>
> >>>>>> 1. Yes, i saw and have a solid proof on that now.
> >>>>>>
> >>>>>> 2. yes but FuzzyQuery takes quotes as they are as they are
> >>>>>> escaped and it is not analyzed.
> >>>>>>
> >>>>>> Stuffing into one textfield vs having separate fields should only
> >>>>>> affect probably the performance but not the outcome in my case.
> >>>>>> But, i have been thinking about this and maybe it is the way to
> >>>>>> go in this case.
> >>>>>>
> >>>>>> mY CONTENT field has street names in mixed case and city, region
> >>>>>> country names in UPPERCASE. Can this be a problem?
> >>>>>> i thought index stored them in lowercase since i am using
> >>>>>> StandardAnalyzer.
> >>>>>>
> >>>>>> CONTENT field also has full textfield string with street=...
> >>>>>> city=... region=... country=... (here all values are UPPERCASE).
> >>>>>>
> >>>>>> Why cant the index find the names via FuzzyQuery? i tried both
> >>>>>> FuzzyQuery and Query builder as i showed before.
> >>>>>>
> >>>>>> The last advice in Your previous email would nicely go outside
> >>>>>> the parantheses since it might be very critical :) :) :)
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 6/12/19 12:17 AM, Tomoko Uchida wrote:
> >>>>>>
> >>>>>> I'd suggest to correctly understand the way a software works before
> >>>>>> suspecting its bug :-)
> >>>>>>
> >>>>>> I guess you may miss two points:
> >>>>>>
> >>>>>> 1. the standard analyzer (standard tokenizer) breaks words by double
> >>>>>> quote (U+0022) so quotes are not indexed or searched at all if
> >>>>>> you are
> >>>>>> using standard analyzer. (That is the reason you have same results
> >>>>>> with or without quotes.)
> >>>>>> See:
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
> >>>>>> and
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=
> >>>>>>
> >>>>>> 2. double quote has special meaning (it's interpreted as phrase
> >>>>>> query)
> >>>>>> with the built-in query parser so you need to escape it if you
> >>>>>> want to
> >>>>>> search double quotes itself.
> >>>>>> See:
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=
> >>>>>>
> >>>>>> (My advice would be to create separate fields for each key value
> >>>>>> pairs
> >>>>>> instead of stuffing all pairs into one text field, if you need to
> >>>>>> search them separately.)
> >>>>>>
> >>>>>> 2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
> >>>>>>
> >>>>>> i can say that quotes is not the issue with index as it still
> >>>>>> results in
> >>>>>> same results with quotes or without quotes.
> >>>>>>
> >>>>>> i am starting to feel that this might be a bug maybe??
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
> >>>>>>
> >>>>>> Somehow " is causing an issue as this should return street with
> >>>>>> MAIN:
> >>>>>>
> >>>>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
> >>>>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united
> >>>>>> states"] -> this was with fuzzyquery on MAINS
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
> >>>>>>
> >>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
> >>>>>> +contentDFLT:"country united states", contentDFLT:street
> >>>>>> contentDFLT:mains]
> >>>>>>
> >>>>>> QueeryParser chops it into two pieces from
> >>>>>> parser.parser("street=\"MAINS\"");
> >>>>>>
> >>>>>> Index has a TextField named contentDFLT the following data :
> >>>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> >>>>>> HAMPSHIRE" country="UNITED STATES"
> >>>>>>
> >>>>>>
> >>>>>> When i set street=\"MAINS~\" with parser:
> >>>>>> i get the following
> >>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
> >>>>>> +contentDFLT:"country united states", contentDFLT:street
> >>>>>> contentDFLT:mains]
> >>>>>>
> >>>>>> probably " quotations are messing this up as You were saying...
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
> >>>>>>
> >>>>>> Or, " (double quotation) in your query string may affect query
> >>>>>> parsing.
> >>>>>>
> >>>>>> When I parse this string by classic query parser (lucene 8.1),
> >>>>>> street="MAINS~"
> >>>>>> parsed (raw) query is
> >>>>>> text:street text:mains
> >>>>>> (I set the default search field to "text", so text:xxxx is appeared
> >>>>>> here.)
> >>>>>>
> >>>>>> Query parsing is a complex process, so it would be good to check
> >>>>>> parsed raw query string especially when you have (reserved) special
> >>>>>> characters in your query...
> >>>>>>
> >>>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I noticed one small thing in your previous mail.
> >>>>>>
> >>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
> >>>>>>
> >>>>>> which is good.
> >>>>>>
> >>>>>> To specify a search field, ":" (colon) should be used instead of
> >>>>>> "=".
> >>>>>> See the query parser documentation:
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I'm not sure this is related to your problem.
> >>>>>>
> >>>>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
> >>>>>>
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
> >>>>>>
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
> >>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
> >>>>>> phraseAnalyzer) ;
> >>>>>>             Query q1 = null;
> >>>>>>             try {
> >>>>>>                 q1 = parser.parse("MAIN");
> >>>>>>             } catch (ParseException e) {
> >>>>>>
> >>>>>>                 e.printStackTrace();
> >>>>>>             }
> >>>>>>             booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
> >>>>>>
> >>>>>> testQuerySearch2 Time to compute: 0 seconds
> >>>>>> Number of results: 1775
> >>>>>> Name: Main St
> >>>>>> Score: 37.20959
> >>>>>> ID: 12681979
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.76416, -71.46681
> >>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
> >>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 37.20959
> >>>>>> ID: 12681977
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.747, -71.45957
> >>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
> >>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
> >>>>>>
> >>>>>> Name: Main St
> >>>>>> Score: 37.20959
> >>>>>> ID: 12681978
> >>>>>> Country Code: US
> >>>>>> Coordinates: 42.73492, -71.44951
> >>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
> >>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
> >>>>>>
> >>>>>>      when i use q1 = parser.parse("street=\"MAIN\""); i get same
> >>>>>> results
> >>>>>> which is good.
> >>>>>>
> >>>>>> But when i switch to MAINS~ then fuzzy query does not work.
> >>>>>>
> >>>>>>
> >>>>>> i need to say something with the q1 only in the booleanquery:
> >>>>>> it tries to match the MAIN in street, city, region and country
> >>>>>> which are
> >>>>>> in a single TextField field.
> >>>>>> But i dont want this. that is why i need to street="..." etc when
> >>>>>> searching.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> just for the basic verification, can you find the document without
> >>>>>> fuzzy query? I mean, does this query work for you?
> >>>>>>
> >>>>>> Query query = parser.parse("MAIN");
> >>>>>>
> >>>>>> Tomoko
> >>>>>>
> >>>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
> >>>>>>
> >>>>>> why cant the second set not work at all?
> >>>>>>
> >>>>>> it is indexed as Textfield like street="..." city="..." etc.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
> >>>>>>
> >>>>>> i dont know how to use Fuzzyquery with queryparser but probably
> >>>>>> You
> >>>>>> are suggesting
> >>>>>>
> >>>>>> QueryParser parser = new QueryParser(field, analyzer) ;
> >>>>>> Query query = parser.parse("MAINS~2");
> >>>>>>
> >>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
> >>>>>>
> >>>>>> am i right?
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
> >>>>>>
> >>>>>> I would suggest using a QueryParser for your fuzzy query before
> >>>>>> adding it to the Boolean query. This should weed out any case
> >>>>>> issues.
> >>>>>>
> >>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
> >>>>>> <mailto:baris.kazar@oracle.com>> wrote:
> >>>>>>
> >>>>>>         BooleanQuery.Builder booleanQuery = new
> >>>>>> BooleanQuery.Builder();
> >>>>>>
> >>>>>>         //First set
> >>>>>>
> >>>>>>                 booleanQuery.add(new FuzzyQuery(new
> >>>>>>         org.apache.lucene.index.Term(field, "MAINS")),
> >>>>>>         BooleanClause.Occur.SHOULD);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>>         "NASHUA"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>>         "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
> >>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>>>         "UNITED STATES"), BooleanClause.Occur.MUST);
> >>>>>>
> >>>>>>         // Second set
> >>>>>>                  //booleanQuery.add(new FuzzyQuery(new
> >>>>>>         org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
> >>>>>>         BooleanClause.Occur.SHOULD);
> >>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
> >>>>>>
> >>>>>>         field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
> >>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
> >>>>>>
> >>>>>>         field, "region=\"NEW HAMPSHIRE\""),
> >>>>>> BooleanClause.Occur.MUST);
> >>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
> >>>>>>
> >>>>>>         field, "country=\"UNITED STATES\""),
> >>>>>> BooleanClause.Occur.MUST);
> >>>>>>
> >>>>>>         The first set brings also street with Nashua name.
> >>>>>> (NASHUA).
> >>>>>>
> >>>>>>         so, to prevent that and since i also indexed with
> >>>>>> street="..."
> >>>>>>         city="..." i did the second set but it does not bring
> >>>>>> anything.
> >>>>>>
> >>>>>>         createPhraseQuery builds a Phrasequery with one term
> >>>>>> equal to the
> >>>>>>         string
> >>>>>>         in the call.
> >>>>>>
> >>>>>>         Best regards
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>         On 6/10/19 10:47 AM, baris.kazar@oracle.com
> >>>>>>         <mailto:baris.kazar@oracle.com> wrote:
> >>>>>>         > How do i check how it is indexed? lowecase or uppercase?
> >>>>>>         >
> >>>>>>         > only way is now to by testing.
> >>>>>>         >
> >>>>>>         > i am using standardanalyzer.
> >>>>>>         >
> >>>>>>         > Best regards
> >>>>>>         >
> >>>>>>         >
> >>>>>>         > On 6/9/19 11:57 AM, Atri Sharma wrote:
> >>>>>>         >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
> >>>>>>         >> <tomoko.uchida.1111@gmail.com
> >>>>>> <mailto:tomoko.uchida.1111@gmail.com>> wrote:
> >>>>>>         >>> Hi,
> >>>>>>         >>>
> >>>>>>         >>> What analyzer do you use for the text field? Is the
> >>>>>> term "Main"
> >>>>>>         >>> correctly indexed?
> >>>>>>         >> Agreed. Also, it would be good if you could post your
> >>>>>> actual
> >>>>>> code.
> >>>>>>         >>
> >>>>>>         >> What analyzer are you using? If you are using
> >>>>>> StandardAnalyzer,
> >>>>>>         then
> >>>>>>         >> all of your terms while indexing will be lowercased,
> >>>>>> AFAIK, but
> >>>>>>         your
> >>>>>>         >> query will not be analyzed until you run a
> >>>>>> QueryParser on it.
> >>>>>>         >>
> >>>>>>         >>
> >>>>>>         >> Atri
> >>>>>>         >>
> >>>>>>         >
> >>>>>>         >
> >>>>>>         >
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>>         > To unsubscribe, e-mail:
> >>>>>> java-user-unsubscribe@lucene.apache.org
> >>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
> >>>>>>         > For additional commands, e-mail:
> >>>>>>         java-user-help@lucene.apache.org
> >>>>>> <mailto:java-user-help@lucene.apache.org>
> >>>>>>         >
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message