lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivasa Meenavalli <Smeenav...@zensar.com>
Subject RE: Default stop word list
Date Fri, 26 Aug 2016 06:25:38 GMT
Hi Steven,

List of Stopwords of a language are not fixed, there is no single universal list of stop words
used by all natural language processing tools .
Ideally stop words should be defined search merchandisers based on their domain instead of
referring default.

https://en.wikipedia.org/wiki/Stop_words

You are allowed to add  lang/stopwords_<languagecode>.txt

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" expand="true" synonyms="synonyms.txt" ignoreCase="true"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>

Regards
Srinivas Meenavalli

-----Original Message-----
From: Steven White [mailto:swhite4141@gmail.com]
Sent: Friday, August 26, 2016 4:02 AM
To: solr-user@lucene.apache.org
Subject: Default stopword list

Hi everyone,

I'm curious, the current "default" stopword list, for English and other languages, how was
it determined?  And for English, why "I" is not in the stopword list?

Thanks in advanced.

Steve
Disclaimer: The contents of this e-mail and attachment(s) thereto are confidential and intended
for the named recipient(s) only. It shall not attach any liability on the originator or Zensar
Technologies Limited or its affiliates. Any views or opinions presented in this email are
solely those of the author and may not necessarily reflect the opinions of Zensar Technologies
Limited or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of the
author of this e-mail is strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. Before opening any mail and attachments please
check them for viruses and defect. Zensar Technologies Ltd or its affiliate do not accept
any liability for virus infected mails.
Mime
View raw message