spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: How to use Mahout VectorWritable in Spark.
Date Thu, 15 May 2014 00:48:37 GMT
PS spark shell with all proper imports are also supported natively in
Mahout (mahout spark-shell command). See M-1489 for specifics. There's also
a tutorial somewhere but i suspect it has not been yet finished/publised
via public link yet. Again, you need trunk to use spark shell there.


On Wed, May 14, 2014 at 12:43 AM, Stuti Awasthi <stutiawasthi@hcl.com>wrote:

> Hi Xiangrui,
> Thanks for the response .. I tried few ways to include mahout-math jar
> while launching Spark shell.. but no success.. Can you please point what I
> am doing wrong
>
> 1. mahout-math.jar exported in CLASSPATH, and PATH
> 2. Tried Launching Spark Shell by :  MASTER=spark://<HOSTNAME>:<PORT>
> ADD_JARS=~/installations/work-space/mahout-math-0.7.jar
> park-0.9.0/bin/spark-shell
>
>  After launching, I checked the environment details on WebUi: It looks
> like mahout-math jar is included.
> spark.jars      /home/hduser/installations/work-space/mahout-math-0.7.jar
>
> Then I try :
> scala> import org.apache.mahout.math.VectorWritable
> <console>:10: error: object mahout is not a member of package org.apache
>        import org.apache.mahout.math.VectorWritable
>
> scala> val raw = sc.sequenceFile(path, classOf[Text],
> classOf[VectorWritable])
> <console>:12: error: not found: type Text
>        val data =
> sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000",
> classOf[Text], classOf[VectorWritable])
>
>                                    ^
> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>
> Thanks
> Stuti
>
>
>
> -----Original Message-----
> From: Xiangrui Meng [mailto:mengxr@gmail.com]
> Sent: Wednesday, May 14, 2014 11:56 AM
> To: user@spark.apache.org
> Subject: Re: How to use Mahout VectorWritable in Spark.
>
> You need
>
> > val raw = sc.sequenceFile(path, classOf[Text],
> > classOf[VectorWriteable])
>
> to load the data. After that, you can do
>
> > val data = raw.values.map(_.get)
>
> To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when
> you launch spark-shell to include mahout-math.
>
> Best,
> Xiangrui
>
> On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawasthi@hcl.com>
> wrote:
> > Hi All,
> >
> > I am very new to Spark and trying to play around with Mllib hence
> > apologies for the basic question.
> >
> >
> >
> > I am trying to run KMeans algorithm using Mahout and Spark MLlib to
> > see the performance. Now initial datasize was 10 GB. Mahout converts
> > the data in Sequence File <Text,VectorWritable> which is used for KMeans
> Clustering.
> > The Sequence File crated was ~ 6GB in size.
> >
> >
> >
> > Now I wanted if I can use the Mahout Sequence file to be executed in
> > Spark MLlib for KMeans . I have read that SparkContext.sequenceFile
> > may be used here. Hence I tried to read my sequencefile as below but
> getting the error :
> >
> >
> >
> > Command on Spark Shell :
> >
> > scala> val data = sc.sequenceFile[String,VectorWritable]("/
> > KMeans_dataset_seq/part-r-00000",String,VectorWritable)
> >
> > <console>:12: error: not found: type VectorWritable
> >
> >        val data = sc.sequenceFile[String,VectorWritable]("
> > /KMeans_dataset_seq/part-r-00000",String,VectorWritable)
> >
> >
> >
> > Here I have 2 ques:
> >
> > 1.  Mahout has “Text” as Key but Spark is printing “not found: type:Text”
> > hence I changed it to String.. Is this correct ???
> >
> > 2. How will VectorWritable be found in Spark. Do I need to include
> > Mahout jar in Classpath or any other option ??
> >
> >
> >
> > Please Suggest
> >
> >
> >
> > Regards
> >
> > Stuti Awasthi
> >
> >
> >
> > ::DISCLAIMER::
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > --------
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as
> > information could be intercepted, corrupted, lost, destroyed, arrive
> > late or incomplete, or may contain viruses in transmission. The e mail
> > and its contents (with or without referred errors) shall therefore not
> > attach any liability on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those of
> > the author and may not necessarily reflect the views or opinions of
> > HCL or its affiliates. Any form of reproduction, dissemination,
> > copying, disclosure, modification, distribution and / or publication
> > of this message without the prior written consent of authorized
> > representative of HCL is strictly prohibited. If you have received
> > this email in error please delete it and notify the sender
> > immediately.
> > Before opening any email and/or attachments, please check them for
> > viruses and other defects.
> >
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > --------
>

Mime
View raw message