mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Jain <rohitkjai...@gmail.com>
Subject Re: Mahout rowSimilarity
Date Wed, 04 May 2016 05:51:54 GMT
Hello Nikaash,
So you mean I need to first read data from my mogodb using scala's mongo
driver and then convert it into indexed datasets. And then process it using
row similarity?

On Wed, May 4, 2016 at 7:56 AM, Nikaash Puri <nikaashpuri@gmail.com> wrote:

> Hi Rohit,
>
> This would be a good place to start.
> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
> <
> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
> >
>
> This bit of code, in particular is how to call the spark-rowsimilarity
> from Scala:
>
> val rowSimilarityIDS =
> SimilarityAnalysis.rowSimilarityIDS(indexedDataset,…)
>
> You can then just write some simple pre processing code that converts your
> database files to the appropriate format for Mahout and read it in as an
> indexed dataset.
>
> This is another great end to end example that achieves a similar result
> using spark-itemsimilarity.
> https://mahout.apache.org/users/environment/how-to-build-an-app.html <
> https://mahout.apache.org/users/environment/how-to-build-an-app.html>
>
> Let me know if you need more help.
>
> Thank you,
> Nikaash Puri
> > On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjain90@gmail.com> wrote:
> >
> > Hello Pat,
> > Can you please explain it in little detail. I didn't understand how to go
> > about it.
> >
> > On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel <pat@occamsmachete.com>
> wrote:
> >
> >> Sure, but at least some would be Scala. There are examples in Mahout
> that
> >> take PairRDDs as input but anything that constructs an IndexedDataset
> would
> >> be fine. I use this code in a system that creates an RDD from HBase.
> Think
> >> of the task as one of how to create a Spark RDD from your DB content.
> >>
> >> On May 3, 2016, at 4:32 AM, Rohit Jain <rohitkjain90@gmail.com> wrote:
> >>
> >> Hello Everyone,
> >> I have products and there are certain associated tags to each product.
> So
> >> to find similar products I am using mahout spark-rowsimilarity
> algorithm in
> >> following manner.
> >>
> >> $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs://
> 0.0.0.0:9000/wtrousers
> >> -o
> >> hdfs://0.0.0.0:9000/s_trousers_out1/ -D:spark.io.compression.=lzf -ma
> >> spark://0.0.0.0:7077
> >> To run this command I need to pull data from database to flat file. Is
> >> there anyway I can use this command / write java code  directly to work
> on
> >> database?
> >>
> >> --
> >> Thanks & Regards,
> >>
> >> *Rohit Jain*
> >> Web developer | Consultant
> >> Mob +91 8097283931
> >>
> >>
> >
> >
> > --
> > Thanks & Regards,
> >
> > *Rohit Jain*
> > Web developer | Consultant
> > Mob +91 8097283931
>
>


-- 
Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message