lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Halácsy Péter <>
Subject RE: Lucene and the numbers (again!)
Date Wed, 14 Nov 2001 09:09:43 GMT
I had the same problem (searching for Alfa 147 [it's a very cool car]).

SimpleAnalyzer uses LowerCaseTokenizer. This "divides text at
non-letters and converts
  them to lower case." (source: API docs) Since numbers are non-letters,
it makes as much as three tokens from the string "147".

I managed to use numbers in the query after using StandardAnalyzer at
query side and at index side as well. Try it!
(org.apache.lucene.analysis.standard package)


> -----Original Message-----
> From: Steven J. Owens []
> Sent: Tuesday, November 13, 2001 8:07 PM
> To: Lucene Users List; David Bonilla
> Subject: Re: Lucene and the numbers (again!)
> David,
> > Yeah I know that i?m not very original and maybe the FAQ can resolve
> > my problems but I didn?t find any real help there. Ok... here we go:
>      This is indeed a FAQ, and it also comes up often on the list, if
> you check the archives.
>      Come to think of it, where *are* the archives now?  I'm looking
> at and I don't see the more recent
> (post-move-to-jakarta) postings there.  The archive seems to end on
> October 5th.  Are we using a new archive now?  Are the messages from
> the old archive there?
> > The problem is... I have a J2EE application working with LUCENE and
> > the basic searching works properly but when I try to use a query
> > with a number, Lucene gives me back all my indexed documents and I
> > don?t understand why.
> >
> > I?m using the SimpleAnalyzer. If I have for example a document with
> > a field named "name". How can I search for example a name 
> like '456'?
>      If you look into... hm, well, I was going to say if you look into
> the API docs, but it's not that simple.  I remember somebody in the
> past saying (on this list) to simply use StandardAnalyzer instead of
> StopAnalyzer.  I don't know if this works (I'll have to take time
> later this afternoon and check the source code - this is one of the
> things that's been on my to-do list for a month or so, but I've been
> preoccupied with other areas of my project).
>      I guess I should note the following details:
> StandardAnalyzer is not listed in the "Package
> org.apache.lucene.analysis" page of the API docs (from the
> lucene-1.2-rc2 checkout), just SimpleAnalyzer and StopAnalyzer.
> When I track down the API docs for StandardAnalyzer and compare them,
> neither StandardAnalyzer or StopAnalyzer says anything about numbers.
> Nor do StandardFilter or StopFilter mention numbers.
> Searching for "numeric" turns up nothing, searching for "number" turns
> up:
> "27. How does Lucene handle numbers and special characters ?
> This depends of the analyzer you are using for indexing an searching."
>      I checked out the Lucene FAQ source in the past,with the intent
> of going through it and checking for typos, etc, as a good way to
> force myself to read it all as well as contributing something back to
> Lucene.  I think that was from the pre-jakarta days, I should probably
> get a fresh, jakarta-based checkout.  Is all of that stuff still in
> the "website" checkout?
> Steven J. Owens
> --
> To unsubscribe, e-mail:   
> <>
> For additional commands, e-mail: 
> <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message