lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: how to do partial word searches?
Date Wed, 25 Nov 2009 14:03:02 GMT
Confession: I haven't had occasion to use the ngram thingy, but here's the
theory....
And note that SOLR has n-gram tokenizers available..

Using a 2-gram example for sullivan, the n-gram would index these tokens...
su, ul, ll, li, iv, va, an. Then at query time in your example, sulli would
be
broken up into su, ul, ll and li. Which, when searched as a phrase
would turn match your field.....

The expense, of course is that your index is larger (but surprisingly not as
much as you'd think). But your queries are much faster.....

That's the theory anyway, the practice is "left as an exercise for the
reader"<G>

But "the folks" generously provided quite an explication of what wildcards
are
all about on the *lucene* user's list, look for a thread titled
"I just don't get wildcards at all" from around 2006. It's a nice background
for
what the underlying problem is, some of the SOLR tokenizers are realizing
some of this I think. And the state of the art has progressed considerably
since then, but the underlying issues are still there...

Sorry I can't be more help here..
Erick

On Wed, Nov 25, 2009 at 8:18 AM, Joel Nylund <jnylund@yahoo.com> wrote:

> Hi Erick,
>
> thanks for the links, I read both of them and I still have no idea what to
> do, lots of back and forth, but didn't see any solution on it.
>
> One person talked about indexing the field in reverse and doing and ON on
> it, this might work I guess.
>
> thanks
> Joel
>
>
>
> On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:
>
>  copying from Eric Hatcher:
>>
>> See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
>> does not have leading wildcard support enabled.
>>
>> There's a pretty extensive recent exchange on this, see the
>> thread on the user's list titled
>>
>> "leading and trailing wildcard query"Best
>> Erick
>>
>> On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jnylund@yahoo.com> wrote:
>>
>>  Hi, I saw some older postings on this, but didnt see a resolution.
>>>
>>> I have a field called title, I would like to be able to find partial word
>>> matches within the title.
>>>
>>> For example:
>>>
>>> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>>>
>>> I would expect it to find:
>>> <str name="textTitle">the daily dish | by andrew sullivan</str>
>>>
>>> but it doesnt, it does find sully (which is fine with me also as a
>>> bonus),
>>> but doesnt seem to get any of the partial word stuff. Oddly enough before
>>> I
>>> lowercased the title, the wildcard matching seemed to work a bit better,
>>> it
>>> just didnt deal with the case sensitive query.
>>>
>>> At first I had mixed case titles and I read that the wildcard doesn't
>>> work
>>> with mixed case, so I created another field that is a lowered version of
>>> the
>>> title called "textTitle", it is of type text.
>>>
>>> Is it possible with solr to achieve what I am trying to do, if so how? If
>>> not, anything closer than what I have?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message