mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikaash Puri <nikaashp...@gmail.com>
Subject Re: Mahout rowSimilarity
Date Wed, 04 May 2016 02:26:11 GMT
Hi Rohit,

This would be a good place to start. https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
<https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala>

This bit of code, in particular is how to call the spark-rowsimilarity from Scala:

val rowSimilarityIDS = SimilarityAnalysis.rowSimilarityIDS(indexedDataset,…)

You can then just write some simple pre processing code that converts your database files
to the appropriate format for Mahout and read it in as an indexed dataset.

This is another great end to end example that achieves a similar result using spark-itemsimilarity.
https://mahout.apache.org/users/environment/how-to-build-an-app.html <https://mahout.apache.org/users/environment/how-to-build-an-app.html>

Let me know if you need more help.

Thank you,
Nikaash Puri
> On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjain90@gmail.com> wrote:
> 
> Hello Pat,
> Can you please explain it in little detail. I didn't understand how to go
> about it.
> 
> On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
> 
>> Sure, but at least some would be Scala. There are examples in Mahout that
>> take PairRDDs as input but anything that constructs an IndexedDataset would
>> be fine. I use this code in a system that creates an RDD from HBase. Think
>> of the task as one of how to create a Spark RDD from your DB content.
>> 
>> On May 3, 2016, at 4:32 AM, Rohit Jain <rohitkjain90@gmail.com> wrote:
>> 
>> Hello Everyone,
>> I have products and there are certain associated tags to each product. So
>> to find similar products I am using mahout spark-rowsimilarity algorithm in
>> following manner.
>> 
>> $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs://0.0.0.0:9000/wtrousers
>> -o
>> hdfs://0.0.0.0:9000/s_trousers_out1/ -D:spark.io.compression.=lzf -ma
>> spark://0.0.0.0:7077
>> To run this command I need to pull data from database to flat file. Is
>> there anyway I can use this command / write java code  directly to work on
>> database?
>> 
>> --
>> Thanks & Regards,
>> 
>> *Rohit Jain*
>> Web developer | Consultant
>> Mob +91 8097283931
>> 
>> 
> 
> 
> -- 
> Thanks & Regards,
> 
> *Rohit Jain*
> Web developer | Consultant
> Mob +91 8097283931


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message