spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <>
Subject How to use Mahout VectorWritable in Spark.
Date Wed, 14 May 2014 05:37:45 GMT
Hi All,
I am very new to Spark and trying to play around with Mllib hence apologies for the basic

I am trying to run KMeans algorithm using Mahout and Spark MLlib to see the performance. Now
initial datasize was 10 GB. Mahout converts the data in Sequence File <Text,VectorWritable>
which is used for KMeans Clustering.  The Sequence File crated was ~ 6GB in size.

Now I wanted if I can use the Mahout Sequence file to be executed in Spark MLlib for KMeans
. I have read that SparkContext.sequenceFile may be used here. Hence I tried to read my sequencefile
as below but getting the error :

Command on Spark Shell :
scala> val data = sc.sequenceFile[String,VectorWritable]("/ KMeans_dataset_seq/part-r-00000",String,VectorWritable)
<console>:12: error: not found: type VectorWritable
       val data = sc.sequenceFile[String,VectorWritable](" /KMeans_dataset_seq/part-r-00000",String,VectorWritable)

Here I have 2 ques:
1.  Mahout has "Text" as Key but Spark is printing "not found: type:Text" hence I changed
it to String.. Is this correct ???
2. How will VectorWritable be found in Spark. Do I need to include Mahout jar in Classpath
or any other option ??

Please Suggest

Stuti Awasthi


The contents of this e-mail and any attachment(s) are confidential and intended for the named
recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e
mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator
or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may
not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying,
disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized
representative of
HCL is strictly prohibited. If you have received this email in error please delete it and
notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.


View raw message