spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: How to use Mahout VectorWritable in Spark.
Date Thu, 15 May 2014 00:52:27 GMT
PPS The shell/spark tutorial i've mentioned is actually being developed in
MAHOUT-1542. As it stands, i believe it is now complete in its core.


On Wed, May 14, 2014 at 5:48 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> PS spark shell with all proper imports are also supported natively in
> Mahout (mahout spark-shell command). See M-1489 for specifics. There's also
> a tutorial somewhere but i suspect it has not been yet finished/publised
> via public link yet. Again, you need trunk to use spark shell there.
>
>
> On Wed, May 14, 2014 at 12:43 AM, Stuti Awasthi <stutiawasthi@hcl.com>wrote:
>
>> Hi Xiangrui,
>> Thanks for the response .. I tried few ways to include mahout-math jar
>> while launching Spark shell.. but no success.. Can you please point what I
>> am doing wrong
>>
>> 1. mahout-math.jar exported in CLASSPATH, and PATH
>> 2. Tried Launching Spark Shell by :  MASTER=spark://<HOSTNAME>:<PORT>
>> ADD_JARS=~/installations/work-space/mahout-math-0.7.jar
>> park-0.9.0/bin/spark-shell
>>
>>  After launching, I checked the environment details on WebUi: It looks
>> like mahout-math jar is included.
>> spark.jars      /home/hduser/installations/work-space/mahout-math-0.7.jar
>>
>> Then I try :
>> scala> import org.apache.mahout.math.VectorWritable
>> <console>:10: error: object mahout is not a member of package org.apache
>>        import org.apache.mahout.math.VectorWritable
>>
>> scala> val raw = sc.sequenceFile(path, classOf[Text],
>> classOf[VectorWritable])
>> <console>:12: error: not found: type Text
>>        val data =
>> sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000",
>> classOf[Text], classOf[VectorWritable])
>>
>>                                    ^
>> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>>
>> Thanks
>> Stuti
>>
>>
>>
>> -----Original Message-----
>> From: Xiangrui Meng [mailto:mengxr@gmail.com]
>> Sent: Wednesday, May 14, 2014 11:56 AM
>> To: user@spark.apache.org
>> Subject: Re: How to use Mahout VectorWritable in Spark.
>>
>> You need
>>
>> > val raw = sc.sequenceFile(path, classOf[Text],
>> > classOf[VectorWriteable])
>>
>> to load the data. After that, you can do
>>
>> > val data = raw.values.map(_.get)
>>
>> To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar`
>> when you launch spark-shell to include mahout-math.
>>
>> Best,
>> Xiangrui
>>
>> On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawasthi@hcl.com>
>> wrote:
>> > Hi All,
>> >
>> > I am very new to Spark and trying to play around with Mllib hence
>> > apologies for the basic question.
>> >
>> >
>> >
>> > I am trying to run KMeans algorithm using Mahout and Spark MLlib to
>> > see the performance. Now initial datasize was 10 GB. Mahout converts
>> > the data in Sequence File <Text,VectorWritable> which is used for
>> KMeans Clustering.
>> > The Sequence File crated was ~ 6GB in size.
>> >
>> >
>> >
>> > Now I wanted if I can use the Mahout Sequence file to be executed in
>> > Spark MLlib for KMeans . I have read that SparkContext.sequenceFile
>> > may be used here. Hence I tried to read my sequencefile as below but
>> getting the error :
>> >
>> >
>> >
>> > Command on Spark Shell :
>> >
>> > scala> val data = sc.sequenceFile[String,VectorWritable]("/
>> > KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>> >
>> > <console>:12: error: not found: type VectorWritable
>> >
>> >        val data = sc.sequenceFile[String,VectorWritable]("
>> > /KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>> >
>> >
>> >
>> > Here I have 2 ques:
>> >
>> > 1.  Mahout has “Text” as Key but Spark is printing “not found:
>> type:Text”
>> > hence I changed it to String.. Is this correct ???
>> >
>> > 2. How will VectorWritable be found in Spark. Do I need to include
>> > Mahout jar in Classpath or any other option ??
>> >
>> >
>> >
>> > Please Suggest
>> >
>> >
>> >
>> > Regards
>> >
>> > Stuti Awasthi
>> >
>> >
>> >
>> > ::DISCLAIMER::
>> > ----------------------------------------------------------------------
>> > ----------------------------------------------------------------------
>> > --------
>> >
>> > The contents of this e-mail and any attachment(s) are confidential and
>> > intended for the named recipient(s) only.
>> > E-mail transmission is not guaranteed to be secure or error-free as
>> > information could be intercepted, corrupted, lost, destroyed, arrive
>> > late or incomplete, or may contain viruses in transmission. The e mail
>> > and its contents (with or without referred errors) shall therefore not
>> > attach any liability on the originator or HCL or its affiliates.
>> > Views or opinions, if any, presented in this email are solely those of
>> > the author and may not necessarily reflect the views or opinions of
>> > HCL or its affiliates. Any form of reproduction, dissemination,
>> > copying, disclosure, modification, distribution and / or publication
>> > of this message without the prior written consent of authorized
>> > representative of HCL is strictly prohibited. If you have received
>> > this email in error please delete it and notify the sender
>> > immediately.
>> > Before opening any email and/or attachments, please check them for
>> > viruses and other defects.
>> >
>> > ----------------------------------------------------------------------
>> > ----------------------------------------------------------------------
>> > --------
>>
>
>

Mime
View raw message