lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Vatkov <>
Subject Re: Stopwords not working as expected
Date Sun, 03 Jan 2010 02:31:48 GMT
@Mahout experts: could you please, elaborate on that?
It seems that I am stopping successfully quite some words with the stopwords
mechanism in Solr (I do not get search results when querying with stopwords
with the localhost/solr/select interface) but this somehow is not effective
when Solr index gets converted to vectors in the
org.apache.mahout.utils.vectors.lucene.Driver class.
As a result I get clusters which contain (and are even mainly driven by) the
I am still not an expert in reading from Lucene index - is it possible that
the Vector generation uses some "raw" reading of the Solr/Lucene index and
thus getting the stopwords?

Best regards,

On Sun, Jan 3, 2010 at 3:51 AM, Lance Norskog <> wrote:

> Fields are both stored and indexed. The stored copy is exactly what
> you sent in. The index is built with the "text" type's analysis stack
> and is not stored. This output has the stopwords removed. The output
> is not stored in one place, but parts of it are scattered around the
> Lucene index data structures.  When you search for one of these
> stopwords, you should not get any documents.
> On Sat, Jan 2, 2010 at 5:20 PM, Bogdan Vatkov <>
> wrote:
> > Hi,
> >
> > I am using a default (example) configuration of Solr and there the
> > stopwording seems to be enabled for both indexing and querying of fields
> of
> > type "text".
> > I have a custom field which is of the "text" type.
> > I have extended the stopwords.txt file with lots of words but when I
> index
> > some documents the index contains stopwords - I can see this with the
> Luke
> > tool.
> > Am I supposed to see these terms in the index after they are declared in
> the
> > stopwords.txt file?
> > What could be wrong?
> >
> > Best regards,
> > Bogdan
> >
> --
> Lance Norskog

Best regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message