lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Devel <deve...@gmail.com>
Subject Re: Search over a multiValued field
Date Wed, 04 Mar 2015 00:03:36 GMT
Erick,

Thanks a lot for the explanation, makes sense now.

Tom

On Tue, Mar 3, 2015 at 5:54 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> bq: Does it mean that words between " symbols, such as "Orange ordered" are
> treated as a single term, with (implicitly) AND conjunction between them?
>
> not at all. When you quote things, you're getting a "phrase query",
> perhaps one
> with slop. So something like
> "a b" means that 'a' must appear right next to 'b'. This is something
> like an AND
> in the sense that both terms must appear, but it is far more
> restrictive since it takes into
> account the position of the terms in the field.
>
> "a b"~10 means that both words must appear within 10 transpositions in
> the same field.
> You can think of "transposition" as how many intervening terms there
> are, so something
> like "a b"~2 would match docs with "a x b", but not "a x y z b".
>
> And this is where positionIncrementGap comes in. By putting 1000 in
> for it, you guarantee
> "a b"~999 won't match 'a' in one field and 'b' in another.
>
> whereas a AND b would match across successive MV entries no matter what the
> gap.
>
> HTH,
> Erick
>
> On Tue, Mar 3, 2015 at 2:22 PM, Tom Devel <develxy@gmail.com> wrote:
> > Jack,
> >
> > This is exactly what I was looking for, thanks. I found the
> > positionIncrementGap attribute in the schema.xml for the text_en
> >
> > I was putting in "AND" because I read in the Solr documentation that "The
> > OR operator is the default conjunction operator."
> >
> > Does it mean that words between " symbols, such as "Orange ordered" are
> > treated as a single term, with (implicitly) AND conjunction between them?
> >
> > Where could I found more info about this?
> >
> > I am currently reading
> >
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
> >
> > Thanks again
> >
> > On Tue, Mar 3, 2015 at 3:58 PM, Jack Krupansky <jack.krupansky@gmail.com
> >
> > wrote:
> >
> >> Just set the positionIncrementGap for the multivalued field to a much
> >> higher value, like 1000 or 5000. That's the purpose of this attribute,
> to
> >> assure that reasonable proximity matches don't match across multiple
> >> values.
> >>
> >> Also, leave "AND" out of the query phrases - you're just trying to match
> >> the product name and availability.
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel <develxy@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am running Solr 5.0.0 and have a question about proximity search and
> >> > multiValued fields.
> >> >
> >> > I am indexing xml files of the following form with foundField being a
> >> field
> >> > defined as multiValued and text_en my in schema.xml.
> >> >
> >> > <?xml version="1.0" encoding="UTF-8"?>
> >> > <add><doc>
> >> > <field name="id">8</field>
> >> > <field name="foundField">"Oranges from South California -
> >> ordered"</field>
> >> > <field name="foundField">"Green Apples - available"</field>
> >> > <field name="foundField">"Black Report Books - ordered"</field>
> >> > </doc></add>
> >> >
> >> > There are several such documents, and for instance, I would like to
> query
> >> > all documents having in the foundField "Oranges" and "ordered". The
> >> > following proximity query takes care of it:
> >> >
> >> > q=foundField:("oranges AND ordered"~2)
> >> >
> >> > However, a field could have more words, and I also cannot know the
> >> > proximity of the desired query words in advance. Setting the proximity
> >> > value too high results in false positives, the following query also
> >> returns
> >> > the document (although "available" was in the entry about Apples):
> >> >
> >> > foundField:("oranges AND available"~200)
> >> >
> >> > I do not think that tweaking a proximity value is the correct
> approach.
> >> >
> >> > How can I search to match contents in a multiValued field per Value as
> >> > described above, without running into the problem?
> >> >
> >> > Many thanks for any help
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message