lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomoko Uchida <tomoko.uchida.1...@gmail.com>
Subject Re: FuzzyQuery
Date Wed, 12 Jun 2019 04:17:03 GMT
I'd suggest to correctly understand the way a software works before
suspecting its bug :-)

I guess you may miss two points:

1. the standard analyzer (standard tokenizer) breaks words by double
quote (U+0022) so quotes are not indexed or searched at all if you are
using standard analyzer. (That is the reason you have same results
with or without quotes.)
See: https://lucene.apache.org/core/8_1_0/core/org/apache/lucene/analysis/standard/StandardTokenizer.html
and http://unicode.org/reports/tr29/

2. double quote has special meaning (it's interpreted as phrase query)
with the built-in query parser so you need to escape it if you want to
search double quotes itself.
See: http://lucene.apache.org/core/8_1_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Terms

(My advice would be to create separate fields for each key value pairs
instead of stuffing all pairs into one text field, if you need to
search them separately.)

2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
>
> i can say that quotes is not the issue with index as it still results in
> same results with quotes or without quotes.
>
> i am starting to feel that this might be a bug maybe??
>
> Best regards
>
>
> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
> > Somehow " is causing an issue as this should return street with MAIN:
> >
> > [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
> > +contentDFLT:"region new-hampshire", +contentDFLT:"country united
> > states"] -> this was with fuzzyquery on MAINS
> >
> > Best regards
> >
> >
> > On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
> >> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
> >> +contentDFLT:"country united states", contentDFLT:street
> >> contentDFLT:mains]
> >>
> >> QueeryParser chops it into two pieces from
> >> parser.parser("street=\"MAINS\"");
> >>
> >> Index has a TextField named contentDFLT the following data :
> >> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> >> HAMPSHIRE" country="UNITED STATES"
> >>
> >>
> >> When i set street=\"MAINS~\" with parser:
> >> i get the following
> >> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
> >> +contentDFLT:"country united states", contentDFLT:street
> >> contentDFLT:mains]
> >>
> >> probably " quotations are messing this up as You were saying...
> >> Best regards
> >>
> >>
> >> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
> >>> Or, " (double quotation) in your query string may affect query parsing.
> >>>
> >>> When I parse this string by classic query parser (lucene 8.1),
> >>> street="MAINS~"
> >>> parsed (raw) query is
> >>> text:street text:mains
> >>> (I set the default search field to "text", so text:xxxx is appeared
> >>> here.)
> >>>
> >>> Query parsing is a complex process, so it would be good to check
> >>> parsed raw query string especially when you have (reserved) special
> >>> characters in your query...
> >>>
> >>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
> >>>> Hi,
> >>>>
> >>>> I noticed one small thing in your previous mail.
> >>>>
> >>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
> >>>> which is good.
> >>>>
> >>>> To specify a search field, ":" (colon) should be used instead of "=".
> >>>> See the query parser documentation:
> >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=
> >>>>
> >>>>
> >>>> I'm not sure this is related to your problem.
> >>>>
> >>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
> >>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
> >>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
> >>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
> >>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
> >>>>>
> >>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
> >>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
> >>>>> phraseAnalyzer) ;
> >>>>>           Query q1 = null;
> >>>>>           try {
> >>>>>               q1 = parser.parse("MAIN");
> >>>>>           } catch (ParseException e) {
> >>>>>
> >>>>>               e.printStackTrace();
> >>>>>           }
> >>>>>           booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
> >>>>>
> >>>>> testQuerySearch2 Time to compute: 0 seconds
> >>>>> Number of results: 1775
> >>>>> Name: Main St
> >>>>> Score: 37.20959
> >>>>> ID: 12681979
> >>>>> Country Code: US
> >>>>> Coordinates: 42.76416, -71.46681
> >>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
> >>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
> >>>>>
> >>>>> Name: Main St
> >>>>> Score: 37.20959
> >>>>> ID: 12681977
> >>>>> Country Code: US
> >>>>> Coordinates: 42.747, -71.45957
> >>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
> >>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
> >>>>>
> >>>>> Name: Main St
> >>>>> Score: 37.20959
> >>>>> ID: 12681978
> >>>>> Country Code: US
> >>>>> Coordinates: 42.73492, -71.44951
> >>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
> >>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
> >>>>>
> >>>>>    when i use q1 = parser.parse("street=\"MAIN\""); i get same
> >>>>> results
> >>>>> which is good.
> >>>>>
> >>>>> But when i switch to MAINS~ then fuzzy query does not work.
> >>>>>
> >>>>>
> >>>>> i need to say something with the q1 only in the booleanquery:
> >>>>> it tries to match the MAIN in street, city, region and country
> >>>>> which are
> >>>>> in a single TextField field.
> >>>>> But i dont want this. that is why i need to street="..." etc when
> >>>>> searching.
> >>>>>
> >>>>> Best regards
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> just for the basic verification, can you find the document without
> >>>>>> fuzzy query? I mean, does this query work for you?
> >>>>>>
> >>>>>> Query query = parser.parse("MAIN");
> >>>>>>
> >>>>>> Tomoko
> >>>>>>
> >>>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
> >>>>>>> why cant the second set not work at all?
> >>>>>>>
> >>>>>>> it is indexed as Textfield like street="..." city="..."
etc.
> >>>>>>>
> >>>>>>> Best regards
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
> >>>>>>>> i dont know how to use Fuzzyquery with queryparser but
probably
> >>>>>>>> You
> >>>>>>>> are suggesting
> >>>>>>>>
> >>>>>>>> QueryParser parser = new QueryParser(field, analyzer)
;
> >>>>>>>> Query query = parser.parse("MAINS~2");
> >>>>>>>>
> >>>>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
> >>>>>>>>
> >>>>>>>> am i right?
> >>>>>>>> Best regards
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
> >>>>>>>>> I would suggest using a QueryParser for your fuzzy
query before
> >>>>>>>>> adding it to the Boolean query. This should weed
out any case
> >>>>>>>>> issues.
> >>>>>>>>>
> >>>>>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
> >>>>>>>>> <mailto:baris.kazar@oracle.com>> wrote:
> >>>>>>>>>
> >>>>>>>>>       BooleanQuery.Builder booleanQuery = new
> >>>>>>>>> BooleanQuery.Builder();
> >>>>>>>>>
> >>>>>>>>>       //First set
> >>>>>>>>>
> >>>>>>>>>               booleanQuery.add(new FuzzyQuery(new
> >>>>>>>>>       org.apache.lucene.index.Term(field, "MAINS")),
> >>>>>>>>>       BooleanClause.Occur.SHOULD);
> >>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer,
field,
> >>>>>>>>>       "NASHUA"), BooleanClause.Occur.MUST);
> >>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer,
field,
> >>>>>>>>>       "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
> >>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer,
field,
> >>>>>>>>>       "UNITED STATES"), BooleanClause.Occur.MUST);
> >>>>>>>>>
> >>>>>>>>>       // Second set
> >>>>>>>>>                //booleanQuery.add(new FuzzyQuery(new
> >>>>>>>>>       org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
> >>>>>>>>>       BooleanClause.Occur.SHOULD);
> >>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
> >>>>>>>>>
> >>>>>>>>>       field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
> >>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
> >>>>>>>>>
> >>>>>>>>>       field, "region=\"NEW HAMPSHIRE\""),
> >>>>>>>>> BooleanClause.Occur.MUST);
> >>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
> >>>>>>>>>
> >>>>>>>>>       field, "country=\"UNITED STATES\""),
> >>>>>>>>> BooleanClause.Occur.MUST);
> >>>>>>>>>
> >>>>>>>>>       The first set brings also street with Nashua
name.
> >>>>>>>>> (NASHUA).
> >>>>>>>>>
> >>>>>>>>>       so, to prevent that and since i also indexed
with
> >>>>>>>>> street="..."
> >>>>>>>>>       city="..." i did the second set but it does
not bring
> >>>>>>>>> anything.
> >>>>>>>>>
> >>>>>>>>>       createPhraseQuery builds a Phrasequery with
one term
> >>>>>>>>> equal to the
> >>>>>>>>>       string
> >>>>>>>>>       in the call.
> >>>>>>>>>
> >>>>>>>>>       Best regards
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>       On 6/10/19 10:47 AM, baris.kazar@oracle.com
> >>>>>>>>>       <mailto:baris.kazar@oracle.com> wrote:
> >>>>>>>>>       > How do i check how it is indexed? lowecase
or uppercase?
> >>>>>>>>>       >
> >>>>>>>>>       > only way is now to by testing.
> >>>>>>>>>       >
> >>>>>>>>>       > i am using standardanalyzer.
> >>>>>>>>>       >
> >>>>>>>>>       > Best regards
> >>>>>>>>>       >
> >>>>>>>>>       >
> >>>>>>>>>       > On 6/9/19 11:57 AM, Atri Sharma wrote:
> >>>>>>>>>       >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko
Uchida
> >>>>>>>>>       >> <tomoko.uchida.1111@gmail.com
> >>>>>>>>> <mailto:tomoko.uchida.1111@gmail.com>>
wrote:
> >>>>>>>>>       >>> Hi,
> >>>>>>>>>       >>>
> >>>>>>>>>       >>> What analyzer do you use for
the text field? Is the
> >>>>>>>>> term "Main"
> >>>>>>>>>       >>> correctly indexed?
> >>>>>>>>>       >> Agreed. Also, it would be good if
you could post your
> >>>>>>>>> actual
> >>>>>>>>> code.
> >>>>>>>>>       >>
> >>>>>>>>>       >> What analyzer are you using? If you
are using
> >>>>>>>>> StandardAnalyzer,
> >>>>>>>>>       then
> >>>>>>>>>       >> all of your terms while indexing
will be lowercased,
> >>>>>>>>> AFAIK, but
> >>>>>>>>>       your
> >>>>>>>>>       >> query will not be analyzed until
you run a
> >>>>>>>>> QueryParser on it.
> >>>>>>>>>       >>
> >>>>>>>>>       >>
> >>>>>>>>>       >> Atri
> >>>>>>>>>       >>
> >>>>>>>>>       >
> >>>>>>>>>       >
> >>>>>>>>>       >
> >>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>
> >>>>>>>>>       > To unsubscribe, e-mail:
> >>>>>>>>> java-user-unsubscribe@lucene.apache.org
> >>>>>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
> >>>>>>>>>       > For additional commands, e-mail:
> >>>>>>>>>       java-user-help@lucene.apache.org
> >>>>>>>>> <mailto:java-user-help@lucene.apache.org>
> >>>>>>>>>       >
> >>>>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>>
> >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message