lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7760) StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying
Date Sun, 02 Apr 2017 22:28:41 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952875#comment-15952875
] 

Steve Rowe commented on LUCENE-7760:
------------------------------------

Oh, I forgot to mention: UAX29URLEmailTokenizer has the same issue, and would benefit from
the same javadoc fix (and tests).

> StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying
> -----------------------------------------------------------------
>
>                 Key: LUCENE-7760
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7760
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.6
>
>         Attachments: LUCENE-7760.patch
>
>
> The javadocs claim that too-long tokens are discarded, but in fact they are simply chopped
up.  The following test case unexpectedly passes:
> {noformat}
>   public void testMaxTokenLengthNonDefault() throws Exception {
>     StandardAnalyzer a = new StandardAnalyzer();
>     a.setMaxTokenLength(5);
>     assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd", "toolo", "ng",
"xy", "z"});
>     a.close();
>   }
> {noformat}
> We should at least fix the javadocs ...
> (I hit this because I was trying to also add {{setMaxTokenLength}} to {{EnglishAnalyzer}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message