lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Search over a multiValued field
Date Tue, 03 Mar 2015 23:54:12 GMT
bq: Does it mean that words between " symbols, such as "Orange ordered" are
treated as a single term, with (implicitly) AND conjunction between them?

not at all. When you quote things, you're getting a "phrase query", perhaps one
with slop. So something like
"a b" means that 'a' must appear right next to 'b'. This is something
like an AND
in the sense that both terms must appear, but it is far more
restrictive since it takes into
account the position of the terms in the field.

"a b"~10 means that both words must appear within 10 transpositions in
the same field.
You can think of "transposition" as how many intervening terms there
are, so something
like "a b"~2 would match docs with "a x b", but not "a x y z b".

And this is where positionIncrementGap comes in. By putting 1000 in
for it, you guarantee
"a b"~999 won't match 'a' in one field and 'b' in another.

whereas a AND b would match across successive MV entries no matter what the
gap.

HTH,
Erick

On Tue, Mar 3, 2015 at 2:22 PM, Tom Devel <develxy@gmail.com> wrote:
> Jack,
>
> This is exactly what I was looking for, thanks. I found the
> positionIncrementGap attribute in the schema.xml for the text_en
>
> I was putting in "AND" because I read in the Solr documentation that "The
> OR operator is the default conjunction operator."
>
> Does it mean that words between " symbols, such as "Orange ordered" are
> treated as a single term, with (implicitly) AND conjunction between them?
>
> Where could I found more info about this?
>
> I am currently reading
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
>
> Thanks again
>
> On Tue, Mar 3, 2015 at 3:58 PM, Jack Krupansky <jack.krupansky@gmail.com>
> wrote:
>
>> Just set the positionIncrementGap for the multivalued field to a much
>> higher value, like 1000 or 5000. That's the purpose of this attribute, to
>> assure that reasonable proximity matches don't match across multiple
>> values.
>>
>> Also, leave "AND" out of the query phrases - you're just trying to match
>> the product name and availability.
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel <develxy@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I am running Solr 5.0.0 and have a question about proximity search and
>> > multiValued fields.
>> >
>> > I am indexing xml files of the following form with foundField being a
>> field
>> > defined as multiValued and text_en my in schema.xml.
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <add><doc>
>> > <field name="id">8</field>
>> > <field name="foundField">"Oranges from South California -
>> ordered"</field>
>> > <field name="foundField">"Green Apples - available"</field>
>> > <field name="foundField">"Black Report Books - ordered"</field>
>> > </doc></add>
>> >
>> > There are several such documents, and for instance, I would like to query
>> > all documents having in the foundField "Oranges" and "ordered". The
>> > following proximity query takes care of it:
>> >
>> > q=foundField:("oranges AND ordered"~2)
>> >
>> > However, a field could have more words, and I also cannot know the
>> > proximity of the desired query words in advance. Setting the proximity
>> > value too high results in false positives, the following query also
>> returns
>> > the document (although "available" was in the entry about Apples):
>> >
>> > foundField:("oranges AND available"~200)
>> >
>> > I do not think that tweaking a proximity value is the correct approach.
>> >
>> > How can I search to match contents in a multiValued field per Value as
>> > described above, without running into the problem?
>> >
>> > Many thanks for any help
>> >
>>

Mime
View raw message