lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2341) explore morfologik integration
Date Mon, 20 Jun 2011 22:16:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052246#comment-13052246
] 

Robert Muir commented on LUCENE-2341:
-------------------------------------

Hi MichaƂ,

This patch looks great!

I took a quick glance, here are a couple suggestions:
* In the MorfologikFilter, I think we should implement reset(), first calling the superclass
reset(), then clearing the stemsAcc list. This ensures that all of the filter's state is cleared
before it is reused. Under normal operations, this should not be necessary, but some consumers
in Lucene (e.g. LimitTokenCountFilter, and some similar code in the Highlighter), will only
partially consume up to some point, then suddenly stop. By clearing this list in reset() we
ensure that there is no chance any leftover stems will appear in the next stream.
* because the data is licensed under MPL, I think we should explicitly list a hyperlink if
possible to the source code used in the NOTICE.txt. I saw you included some wordage in LICENSE.txt
but I think this should only say 'XYZ data is under this license, with the actual MPL license
text. In the NOTICE.txt we should link to the source code I think... there is some more information
on this under the section Category B: Reciprocal Licenses at http://www.apache.org/legal/3party.html


> explore morfologik integration
> ------------------------------
>
>                 Key: LUCENE-2341
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2341
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Dawid Weiss
>         Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message