hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13196) UDFLike: reduce Regex NFA sizes
Date Wed, 02 Mar 2016 07:49:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175218#comment-15175218
] 

Gopal V commented on HIVE-13196:
--------------------------------

Wrote a JMH bench, which explains this change - https://github.com/t3rmin4t0r/regexbench

{code}
# Run complete. Total time: 00:00:41

Benchmark                       Mode  Cnt    Score    Error  Units
RegexBench.testGreedyRegexHit   avgt    5  340.991 ±  7.929  ns/op
RegexBench.testGreedyRegexMiss  avgt    5  466.184 ± 21.349  ns/op
RegexBench.testLazyRegexHit     avgt    5   72.456 ± 16.156  ns/op
RegexBench.testLazyRegexMiss    avgt    5  366.955 ± 49.159  ns/op
{code}

> UDFLike: reduce Regex NFA sizes
> -------------------------------
>
>                 Key: HIVE-13196
>                 URL: https://issues.apache.org/jira/browse/HIVE-13196
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 1.3.0, 1.2.1, 2.0.0, 2.1.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Minor
>         Attachments: HIVE-13196.1.patch
>
>
> The NFAs built from complex regexes in UDFLike are extremely complex and spend a lot
of time doing simple expression matching with no backtracking.
> Prevent NFA -> DFA explosion by using reluctant regex matches instead of greedy matches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message