lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use
Date Mon, 03 Nov 2014 20:42:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195053#comment-14195053
] 

Michael McCandless commented on LUCENE-6046:
--------------------------------------------

I like the test simplifications, and removing dead code from Operations.determinize.

Can we fix the exc thrown from Regexp to include the offending regular expression, and fix
the test to confirm the message contains it?  Maybe also add RegExp.toStringTree?  I found
it very useful while debugging the original regexp :)

I think QueryParserBase should also have a set/get for this option?

> RegExp.toAutomaton high memory use
> ----------------------------------
>
>                 Key: LUCENE-6046
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6046
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 4.10.1
>            Reporter: Lee Hinman
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-6046.patch, LUCENE-6046.patch
>
>
> When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible
for the automaton to use so much memory it exceeds the maximum array size for java.
> The following caused an OutOfMemoryError with a 32gb heap:
> {noformat}
> new RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton();
> {noformat}
> When increased to a 60gb heap, the following exception is thrown:
> {noformat}
>   1> java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum
array in java (2147483623)
>   1>     __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
>   1>     org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
>   1>     org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
>   1>     org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
>   1>     org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
>   1>     org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
>   1>     org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
>   1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
>   1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message