lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Trouble Configuring WordDelimiterFilterFactory
Date Mon, 30 Nov 2009 13:55:55 GMT
I think the problem here is that underlying WordDelimiterFactory
is StandardTokenizer, at least that's what I infer from here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>I
think you want to use a different tokenizer, because StandardTokenizer
may be stripping the decimal from .355. But that's just a guess. You'll get
more info if you examine your index and see what's *really* indexed in
these fields....

Best
Erick

On Sun, Nov 29, 2009 at 10:31 AM, Rahul R <rahul.solr@gmail.com> wrote:

> Steve,
> My settings for both index and query are :
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="1" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> preserveOriginal="1"/>
>
> Let me give an example. Suppose I have the following 2 documents:
> Document 1(Text Field): Bridge-Diode .355 Volts
> Document 2(Text Field): Bridge-Diode 355 Volts
>
> Requirement : Search for ".355" should retrieve only document 1 (Not
> happening now)
> Requirement: Search for "Bridge" should retrieve both documents (Works as
> expected)
>
> The reason why a search for ".355" is retrieving both documents is that
> term
> texts for .355 in the document are created as .355 and 355. Even if I set
> generateWordParts and catenateWords to "0", the way term texts are created
> for ".355" does not change.
>
> Thank you for your time.
>
> Regards
> Rahul
>
> On Sun, Nov 29, 2009 at 1:07 AM, Steven A Rowe <sarowe@syr.edu> wrote:
>
> > Hi Rahul,
> >
> > On 11/26/2009 at 12:53 AM, Rahul R wrote:
> > > Is there a way by which I can prevent the WordDelimiterFilterFactory
> > > from totally acting on numerical data ?
> >
> > "prevent ... from totally acting on" is pretty vague, and nowhere AFAICT
> do
> > you say precisely what it is you want.
> >
> > It would help if you could give example text and the terms you think
> should
> > be the result of analysis of the text.  If you want different index/query
> > time behavior, please provide this info for both.
> >
> > Steve
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message