lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Kate Winslet vs Winslet Kate
Date Sun, 01 Nov 2015 06:40:18 GMT
Yeah, that's actually a tough one. You have no control over what the user types,
you have to try to guess what they meant.

To do that right, you really have to have some meta-data besides what the user
typed in, i.e. recognize "kate" and "winslet" are proper names and "movies" is
something else and break up the query appropriately behind the scenes.

edismax might help here. You could copyField for everything into a
bag_of_words field then boost the name field quite high relative to the
bag_of_words field. That way, and _assuming_ that the bag_of_words
field had all three words, then the user at least gets something.

You can also do some tricks with edismax and the "pf" parameters. That
option automatically takes the input and makes a phrase out of it against
the field, so you get better scores for, say, the name field if it contains
the phrase "kate winslet". doesn't help with the kate winslet movies
though.

On Sat, Oct 31, 2015 at 11:11 PM, Daniel Valdivia
<hola@danielvaldivia.com> wrote:
> Perhaps
>
> q=name:("Kate AND Winslet")
>
> q=name:("Kate Winslet")
>
> Sent from my iPhone
>
>> On Oct 31, 2015, at 10:21 PM, Yangrui Guo <guoyangrui@gmail.com> wrote:
>>
>> Thanks for the reply. Putting the name: before the terms did the work. I
>> just wanted to generalize the search query because users might be
>> interested in querying Kate Winslet herself or her movies. If user enter
>> query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND
>> movie) will return nothing.
>>
>> Yangrui Guo
>>
>> On Saturday, October 31, 2015, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>>
>>> There are a couple of anomalies here.
>>>
>>> 1> kate AND winslet
>>> What does the query look like if you add &debug=true to the statement
>>> and look at the "parsed_query" section of the return?  My guess is you
>>> typed "q=name:kate AND winslet" which parses as "q=name:kate AND
>>> default_search_field:winslet" and are getting matches you don't
>>> expect. You need something like "q=name:(kate AND winslet)" or
>>> "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
>>> more complicated, but that should still honor the intent.
>>>
>>> 2> I have no idea why searching for "Kate Winslet" in quotes returns
>>> anything, I wouldn't expect it to unless you mean you type in "q=kate
>>> winslet" which is searching against your default field, not the name
>>> field.
>>>
>>> Best,
>>> Erick
>>>
>>> On Sat, Oct 31, 2015 at 8:52 PM, Yangrui Guo <guoyangrui@gmail.com
>>> <javascript:;>> wrote:
>>>> Hi today I found an interesting aspect of solr. I imported IMDB data into
>>>> solr. The IMDB puts last name before first name for its person's name
>>> field
>>>> eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
>>>> could get the exact result. However if I search "Kate Winslet" or Kate
>>> AND
>>>> Winslet solr seem to return me all result containing either Kate or
>>> Winslet
>>>> which is similar to "Winslet Kate"~999999. From user perspective I
>>>> certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
>>>> there anyway to make solr score higher for terms in the same field?
>>>>
>>>> Yangrui
>>>

Mime
View raw message