spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: How to use Mahout VectorWritable in Spark.
Date Thu, 15 May 2014 00:43:45 GMT
Mahout now supports doing its distributed linalg natively on Spark so the
problem of sequence file input load into Spark is already solved there
 (trunk, http://mahout.apache.org/users/sparkbindings/home.html,
drmFromHDFS() call -- and then you can access to the direct rdd via "rdd"
matrix property if needed).

if you specifically try ensure interoperability with MLlib, however, I did
not try that -- however, Mahout's linalg & tits bindings to Spark works
with Kryo serializer only, so if/when MLLib  algorithms do not  support
kryo serializer, it would not be interoperable.

-d


On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawasthi@hcl.com>wrote:

>  Hi All,
>
> I am very new to Spark and trying to play around with Mllib hence
> apologies for the basic question.
>
>
>
> I am trying to run KMeans algorithm using Mahout and Spark MLlib to see
> the performance. Now initial datasize was 10 GB. Mahout converts the data
> in Sequence File <Text,VectorWritable> which is used for KMeans
> Clustering.  The Sequence File crated was ~ 6GB in size.
>
>
>
> Now I wanted if I can use the Mahout Sequence file to be executed in Spark
> MLlib for KMeans . I have read that SparkContext.sequenceFile may be used
> here. Hence I tried to read my sequencefile as below but getting the error :
>
>
>
> Command on Spark Shell :
>
> scala> val data = sc.sequenceFile[String,VectorWritable]("/
> KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
> <console>:12: error: not found: type VectorWritable
>
>        val data = sc.sequenceFile[String,VectorWritable]("
> /KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
>
>
> Here I have 2 ques:
>
> 1.  Mahout has “Text” as Key but Spark is printing “not found: type:Text”
> hence I changed it to String.. Is this correct ???
>
> 2. How will VectorWritable be found in Spark. Do I need to include Mahout
> jar in Classpath or any other option ??
>
>
>
> Please Suggest
>
>
>
> Regards
>
> Stuti Awasthi
>
>
>
> ::DISCLAIMER::
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>

Mime
View raw message