spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Plaatje <patr...@bazana.com>
Subject Re: FP growth - Items in a transaction must be unique
Date Thu, 02 Feb 2017 11:22:21 GMT
Hi,

 

This indicates you have duplicate products per row in your dataframe, the FP implementation
only allows unique products per row, so you will need to dedupe duplicate products before
running the FPGrowth algorithm.

 

Best,

Patrick

 

From: "Devi P.V" <devip2136@gmail.com>
Date: Thursday, 2 February 2017 at 07:17
To: "user @spark" <user@spark.apache.org>
Subject: FP growth - Items in a transaction must be unique

 

Hi all,

I am trying to run FP growth algorithm using spark and scala.sample input dataframe is following,

+-------------------------------------------------------------------------------------------+
|productName                                                                             
  
+-------------------------------------------------------------------------------------------+
|Apple Iphone 7 128GB Jet Black with Facetime                                            
  
|Levi’s Blue Slim Fit Jeans- L5112,Rimmel London Lasting Finish Matte by Kate Moss 101 Dusky|
|Iphone 6 Plus (5.5",Limited Stocks, TRA Oman Approved)                                  
  
+-------------------------------------------------------------------------------------------+

Each row contains unique items.

 

I converted it into rdd like following
val transactions = names.as[String].rdd.map(s =>s.split(","))

val fpg = new FPGrowth().
  setMinSupport(0.3).
  setNumPartitions(100)


val model = fpg.run(transactions)
But I got error

WARN TaskSetManager: Lost task 2.0 in stage 27.0 (TID 622, localhost):
org.apache.spark.SparkException: 
Items in a transaction must be unique but got WrappedArray(
Huawei GR3 Dual Sim 16GB 13MP 5Inch 4G,
 Huawei G8 Gold 32GB,  4G,  
5.5 Inches, HTC Desire 816 (Dual Sim, 3G, 8GB),
 Samsung Galaxy S7 Single Sim - 32GB,  4G LTE,  
Gold, Huawei P8 Lite 16GB,  4G LTE, Huawei Y625, 
Samsung Galaxy Note 5 - 32GB,  4G LTE, 
Samsung Galaxy S7 Dual Sim - 32GB)

How to solve this?

Thanks



 

 


Mime
View raw message