spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Addepalli (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-31094) Removing redundant rules in the output of Frequent Pattern Growth Algorithm
Date Mon, 09 Mar 2020 13:00:09 GMT
Aditya Addepalli created SPARK-31094:
----------------------------------------

             Summary: Removing redundant rules in the output of Frequent Pattern Growth Algorithm
                 Key: SPARK-31094
                 URL: https://issues.apache.org/jira/browse/SPARK-31094
             Project: Spark
          Issue Type: Brainstorming
          Components: ML
    Affects Versions: 2.4.5
            Reporter: Aditya Addepalli


Will implement the is.redundant() function similar to the one here: [https://rdrr.io/cran/arules/man/is.redundant.html]

By definition:

A rule is redundant if a more general rules with the same or a higher confidence exists. That
is, a more specific rule is redundant if it is only equally or even less predictive than a
more general rule.

As FP Growth is an exhaustive algorithm, many of the rules it produces are redundant. Therefore
there is merit in implementing this function to spark. This not only reduces the total number
of rules produced in the output, but also produces better rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message