lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8676) TestKoreanTokenizer#testRandomHugeStrings failure
Date Fri, 01 Feb 2019 10:40:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758204#comment-16758204
] 

ASF subversion and git services commented on LUCENE-8676:
---------------------------------------------------------

Commit 5667170cf58732384f185b2983b1f5a21d26369e in lucene-solr's branch refs/heads/branch_7x
from Jim Ferenczi
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5667170 ]

LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused
by a big buffer (1024 chars).


> TestKoreanTokenizer#testRandomHugeStrings failure
> -------------------------------------------------
>
>                 Key: LUCENE-8676
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8676
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>            Priority: Major
>         Attachments: LUCENE-8676.patch
>
>
> KoreanTokenizer#testRandomHugeString failed in CI with the following exception:
> {noformat}
>   [junit4]    > Throwable #1: java.lang.AssertionError
>    [junit4]    >        at __randomizedtesting.SeedInfo.seed([8C5E2BE10F581CB:90E6857D4E833D83]:0)
>    [junit4]    >        at org.apache.lucene.analysis.ko.KoreanTokenizer.add(KoreanTokenizer.java:334)
>    [junit4]    >        at org.apache.lucene.analysis.ko.KoreanTokenizer.parse(KoreanTokenizer.java:707)
>    [junit4]    >        at org.apache.lucene.analysis.ko.KoreanTokenizer.incrementToken(KoreanTokenizer.java:377)
>    [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:748)
>    [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:659)
>    [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:561)
>    [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:474)
>    [junit4]    >        at org.apache.lucene.analysis.ko.TestKoreanTokenizer.testRandomHugeStrings(TestKoreanTokenizer.java:313)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:748)
>    [junit4]   2> NOTE: leaving temporary files
> {noformat}
> I am able to reproduce locally with:
> {noformat}
> ant test  -Dtestcase=TestKoreanTokenizer -Dtests.method=testRandomHugeStrings -Dtests.seed=8C5E2BE10F581CB
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.7/test-data/enwiki.random.lines.txt
-Dtests.locale=uk-UA -Dtests.timezone=Europe/Istanbul -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
> {noformat}
> After some investigation I found out that the position of the buffer is not updated when
the maximum backtrace size is reached (1024).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message