lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Grotzke <martin.grot...@javakaffee.de>
Subject Re: Indexing question - split word and comma
Date Thu, 05 Jul 2007 19:39:22 GMT
On Thu, 2007-07-05 at 11:56 -0700, Mike Klaas wrote:
> On 5-Jul-07, at 11:43 AM, Martin Grotzke wrote:
> 
> > Hi all,
> >
> > I have a document with a name field like this:
> > <field name='name'>MP3-Player, Apple, &#xBB;iPod nano&#xAB;, silber,
> > 4GB</field>
> >
> > and want to find "apple". Unfortunately, I only find "apple,"...
> >
> > Can anybody help me with this?
> 
> Sure: you're using WhitespaceAnalyzer, which only splits on  
> whitespace.  If you want to split words from punctuation, you should  
> use something like StandardAnalyzer or WordDelimiterFilter.
I replaced <tokenizer class="solr.WhitespaceTokenizerFactory"/> by
<tokenizer class="solr.StandardTokenizerFactory"/> in the indexer part
of the fieldtype definition, and now I find apple and ipod, really
great!

> 
> It is also extremely helpful to look at the analysis page on the solr  
> admin (verbose=true) and see exactly what tokens your analyzer produces.
This is such a cool tool, I didn't know it! It's really great that you
see each step of the filters so that it's possible to understand better
what's going on during indexing, really, really cool!!

Thanx a lot,
cheers,
Martin


> 
> -Mike
> 


Mime
View raw message