lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Vatkov <bogdan.vat...@gmail.com>
Subject Re: Unstemming after solr.PorterStemFilterFactory
Date Tue, 19 Jan 2010 22:45:18 GMT
I am using fields like:
  <field name="msg_body" type="body_text" termVectors="true" indexed="true"
stored="true"/>
which contain multi-line text, not just single strings, what does "stored
values" mean?
I am relatively new to Solr

I solved my issue by copy/pasting and enhancing
the SnowballPorterFilterFactory class by
creating SnowballPorterWithUnstemLowerCaseFilterFactory
I added lowercasing inside the factory since I need to capture the original
terms store them in a side file and only then lowercase and stem.

    <fieldType name="body_text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<!--        <filter class="solr.LowerCaseFilterFactory"/> -->
<!--        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/> -->
        <filter
class="org.bogdan.solr.analysis.SnowballPorterWithUnstemLowerCaseFilterFactory"
language="English" protected="protwords.txt" unstemmed="unstemmed.txt"/>
      </analyzer>

I was wondering if there is an easier way (without doing this custom filter
that I did).

Best regards,
Bogdan

On Wed, Jan 20, 2010 at 12:38 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Bogdan,
>
> You can get them from stored values of your fields, if you are storing
> them.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: Bogdan Vatkov <bogdan.vatkov@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Tue, January 19, 2010 5:28:51 PM
> > Subject: Unstemming after solr.PorterStemFilterFactory
> >
> > Hi,
> >
> > I am indexing with the solr.PorterStemFilterFactory included but then I
> need
> > to access the unstemmed versions of the terms, what would be the easiest
> way
> > to get the unstemmed version?
> > Thanks in advance.
> >
> > Best regards,
> > Bogdan
> >
> >
> >
> >
> > --
> > Best regards,
> > Bogdan
>
>


-- 
Best regards,
Bogdan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message