lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Richmond" <richmondm...@gmail.com>
Subject Custom E-mail Tokenizer
Date Wed, 21 Jun 2006 18:50:28 GMT
I have created a custom e-mail tokenizer and am trying to make e-mail
addresses more searchable inside of solr (without having to rely on
wildcard/prefix queries), but am running into a couple problems using
it.

I created a tokenizer that when given the e-mail address
"java-user@lucene.apache.org" it produces the following tokens (this
was discussed on the java lucene users group and can be found here:
http://www.nabble.com/indexing-emails-t1800267.html#a4932444):
    java-user@lucene.apache.org
    java
    user
    java-user
    lucene.apache.org
    lucene
    apache.org
    org


I then added the following to my schema configuration:
    <fieldtype name="email" class="solr.StrField">
        <analyzer type="index">
            <tokenizer
class="com.willetts.wmail.analysis.EmailTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldtype>


If I then fire up solr and use the analysis tool from the admin page,
it seems to work exacly as I would expect (i.e. email addresses that I
type in do get broken up into the correct tokens).  However, when I
add data to this index and then attempt to perform a search using the
search interface I can not get any matches.  For example when I add
"richmondmike@gmail.com" to a field that has type "email" (see schema
configuration above) I can not get the terms "richmondmike", or
"gmail" or "gmail.com" to match any of the results.


Do I need to use a custom fieldtype class as well instead of using
"solr.StrField"?  Any help would be greatly appreciated.


Thanks in advance,

Mike

Mime
View raw message