lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Search for ISBN-like identifiers
Date Thu, 05 Jan 2017 17:58:11 GMT
On 1/5/2017 3:08 AM, Sebastian Riemer wrote:
> I now face the problem, that searching for a book with
> text:978-3-8052-5094-8* does not return the single result I expect.
> However searching for text:9783805250948* instead returns a result.
> Note, that I am adding a wildcard at the end automatically, to further
> broaden the resultset. Note also, that it does not seem to matter
> whether I put backslashes in front of the hyphen or not (to be exact,
> when sending via SolrJ from my application, I put in the backslashes,
> but I don't see a difference when using SolrAdmin as I guess SolrAdmin
> automatically inserts backslashes if needed?) 

As soon as you use a wildcard, the query is no longer run through the
analysis chain, which means that it keeps all those hyphens.  That will
never match anything in the index, because the StandardTokenizer has
removed all the hyphens in the tokens that it puts into the index.  The
fact that wildcards skip analysis is a source of major confusion.  I
assume that the analysis skip is required for correct operation,
although I have never delved that deeply into the internals.

A hyphen is only a special character if it's the first character in a
word.  It's generally a good idea to escape the special characters
anyway, but in this case it doesn't matter, which is why you can send it
unescaped.

If you want to use wildcards, you're going to have to use them on an
untokenized (normally "string") field, or the results will probably not
be what you expect.

Thanks,
Shawn


Mime
View raw message