spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Caique Marques <caiquermarque...@gmail.com>
Subject Re: Python API for Association Rules
Date Thu, 03 Dec 2015 04:07:32 GMT
Hi Joseph.
Sorry for my fail, I will comment on Jira.

Thanks.
Caique.

2015-12-02 19:12 GMT-02:00 Joseph Bradley <joseph@databricks.com>:

> If you're working on a feature, please comment on the JIRA first (to avoid
> conflicts / duplicate work).  Could you please copy what your wrote to the
> JIRA to discuss there?
> Thanks,
> Joseph
>
> On Wed, Dec 2, 2015 at 4:51 AM, caiquermarques95 <
> caiquermarques95@gmail.com> wrote:
>
>> Hello everyone!
>> I'm developing to the Python API for association rules (
>> https://issues.apache.org/jira/browse/SPARK-8855), but I found a doubt.
>>
>> Following the description of the issue, it says that a important method
>> is "*FPGrowthModel.generateAssociationRules()*", of course. However, is
>> not clear if a wrapper for the association rules it will be in "
>> *FPGrowthModelWrapper.scala*" and this is the problem.
>>
>> My idea is the following:
>> 1) In the fpm.py file; class "Association Rules" with one method and a
>> class:
>> 1.1) Method train(data, minConfidence), that will generate the
>> association rules for a data with a minConfidence specified (0.6 default).
>> This method will call the "trainAssociationRules" from the
>> *PythonMLLibAPI* with the parameters data and minConfidence. Later. will
>> return a FPGrowthModel.
>> 1.2) Class Rule, that will a namedtuple, represents an (antecedent,
>> consequent) tuple.
>>
>> 2) Still in fpm.py, in the class FPGrowthModel, a new method will be
>> added, called generateAssociationRules, that will map the Rules generated
>> calling the method "getAssociationRule" from FPGrowthModelWrapper to the
>> namedtuple.
>>
>> Now is my doubt, how to make trainAssociationRules returns a FGrowthModel
>> to the Wrapper just maps the rule received to the antecedent/consequent? I
>> could not do the method trainAssociationRules returns a FPGrowthModel. The
>> wrapper for association rules is in FPGrowthModelWrapper, right?
>>
>> For illustration, I think something like this in *PythonMLLibAPI:*
>>
>> def trainAssociationRules(
>>       data: JavaRDD[FPGrowth.FreqItemset[Any]],
>>       minConfidence: Double): [return type] = {
>>
>>     val model = new FPGrowthModel(data.rdd)
>>       .generateAssociationRules(minConfidence)
>>
>>     new FPGrowthModelWrapper(model)
>>   }
>>
>> And in FPGrowthModelWrapper, something like:
>>
>>  def getAssociationRules: [return type] = {
>>     SerDe.fromTuple2RDD(rule.map(x => (x.javaAntecedent,
>> x.javaConsequent)))
>>  }
>>
>> I know that will fail, but, what is wrong with my idea?
>> Any suggestions?
>>
>> Thanks for the help and the tips.
>> Caique.
>>
>> ------------------------------
>> View this message in context: Python API for Association Rules
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Python-API-for-Association-Rules-tp15419.html>
>> Sent from the Apache Spark Developers List mailing list archive
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
>> Nabble.com.
>>
>
>

Mime
View raw message