spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devi P.V" <devip2...@gmail.com>
Subject FP growth - Items in a transaction must be unique
Date Thu, 02 Feb 2017 07:17:22 GMT
Hi all,

I am trying to run FP growth algorithm using spark and scala.sample input
dataframe is following,

+-------------------------------------------------------------------------------------------+
|productName

+-------------------------------------------------------------------------------------------+
|Apple Iphone 7 128GB Jet Black with
Facetime
|Levi’s Blue Slim Fit Jeans- L5112,Rimmel London Lasting Finish Matte by
Kate Moss 101 Dusky|
|Iphone 6 Plus (5.5",Limited Stocks, TRA Oman
Approved)
+-------------------------------------------------------------------------------------------+

Each row contains unique items.

I converted it into rdd like following

val transactions = names.as[String].rdd.map(s =>s.split(","))

val fpg = new FPGrowth().
  setMinSupport(0.3).
  setNumPartitions(100)


val model = fpg.run(transactions)

But I got error

WARN TaskSetManager: Lost task 2.0 in stage 27.0 (TID 622, localhost):
org.apache.spark.SparkException:
Items in a transaction must be unique but got WrappedArray(
Huawei GR3 Dual Sim 16GB 13MP 5Inch 4G,
 Huawei G8 Gold 32GB,  4G,
5.5 Inches, HTC Desire 816 (Dual Sim, 3G, 8GB),
 Samsung Galaxy S7 Single Sim - 32GB,  4G LTE,
Gold, Huawei P8 Lite 16GB,  4G LTE, Huawei Y625,
Samsung Galaxy Note 5 - 32GB,  4G LTE,
Samsung Galaxy S7 Dual Sim - 32GB)


How to solve this?


Thanks

Mime
View raw message