lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Rafalovitch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-7154) Wildcard query matches special characters
Date Tue, 24 Feb 2015 20:22:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335379#comment-14335379
] 

Alexandre Rafalovitch commented on SOLR-7154:
---------------------------------------------

Which version of Solr is this against. This needs to be tested against version 5 or at least
4.10.3 to be actionable.

Also, the definition of the bug seems simple enough to build a simple complete use case. Would
be useful to have that.

> Wildcard query matches special characters
> -----------------------------------------
>
>                 Key: SOLR-7154
>                 URL: https://issues.apache.org/jira/browse/SOLR-7154
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Arun Rangarajan
>            Priority: Minor
>
> I have a string field raw_name defined like this:
> {code}
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
> ...
> <field name="raw_name" type="string" indexed="true" stored="true" />
> {code}
> I have a document like this:
> {code}
> {raw_name: beyoncé}
> {code}
> Notice that the last character is a special character (accented e).
> When I issue this wildcard query:
> {code}
> q=raw_name:beyonce*
> {code}
> i.e. with the last character simply being the ASCII 'e', Solr returns me the above document.
> Exact query:
> {code}
> /select?q=raw_name:beyonce*&wt=json&fl=raw_name
> {code}
> Response:
> {code}
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 0,
>     "params": {
>       "fl": "raw_name",
>       "q": "raw_name:beyonce*",
>       "wt": "json"
>     }
>   },
>   "response": {
>     "numFound": 2,
>     "start": 0,
>     "docs": [
>       {
>         "raw_name": "beyoncé"
>       },
>       {
>         "raw_name": "beyoncé"
>       }
>     ]
>   }
> }
> {code}
> I used the analysis tool in Solr admin (with Jetty). The raw bytes look like this:
> Raw bytes for beyonce: [62 65 79 6f 6e 63 65]
> Raw bytes for beyoncé: [62 65 79 6f 6e 63 65 cc 81]
> So when you look at the bytes, it seems to explain why beyonce* might match beyoncé.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message