mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Mahout rowSimilarity
Date Wed, 04 May 2016 20:28:44 GMT
Here is an example that takes a PairRDD, which is an RDD of pairs of strings. The row-id and
column-id are expected in the pair. This method inputs each element in the sparse matrix individually.
So if the row-id is a user-id and the column-id is an item-id it will turn them into an IndexedDatasetSpark,
which is essentially 2 BiMaps (one for users, one for items) and a DRM. Once you have the
IndexedDataset pass it to SimiarityAnalysis.
https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/indexeddataset/IndexedDatasetSpark.scala#L68


On May 4, 2016, at 6:12 AM, Rohit Jain <rohitkjain90@gmail.com> wrote:

I am still looking searching for my answer. It will be great if somebody
can help me with this :)

On Wed, May 4, 2016 at 11:25 AM, Rohit Jain <rohitkjain90@gmail.com> wrote:

> And If yes, can you please help me with what exactly do you mean by "You
> can then just write some simple pre processing code that converts your
> database files to the appropriate format for Mahout and read it in as an
> indexed dataset."
> 
> On Wed, May 4, 2016 at 11:21 AM, Rohit Jain <rohitkjain90@gmail.com>
> wrote:
> 
>> Hello Nikaash,
>> So you mean I need to first read data from my mogodb using scala's mongo
>> driver and then convert it into indexed datasets. And then process it using
>> row similarity?
>> 
>> On Wed, May 4, 2016 at 7:56 AM, Nikaash Puri <nikaashpuri@gmail.com>
>> wrote:
>> 
>>> Hi Rohit,
>>> 
>>> This would be a good place to start.
>>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
>>> <
>>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
>>>> 
>>> 
>>> This bit of code, in particular is how to call the spark-rowsimilarity
>>> from Scala:
>>> 
>>> val rowSimilarityIDS =
>>> SimilarityAnalysis.rowSimilarityIDS(indexedDataset,…)
>>> 
>>> You can then just write some simple pre processing code that converts
>>> your database files to the appropriate format for Mahout and read it in as
>>> an indexed dataset.
>>> 
>>> This is another great end to end example that achieves a similar result
>>> using spark-itemsimilarity.
>>> https://mahout.apache.org/users/environment/how-to-build-an-app.html <
>>> https://mahout.apache.org/users/environment/how-to-build-an-app.html>
>>> 
>>> Let me know if you need more help.
>>> 
>>> Thank you,
>>> Nikaash Puri
>>>> On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjain90@gmail.com> wrote:
>>>> 
>>>> Hello Pat,
>>>> Can you please explain it in little detail. I didn't understand how to
>>> go
>>>> about it.
>>>> 
>>>> On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel <pat@occamsmachete.com>
>>> wrote:
>>>> 
>>>>> Sure, but at least some would be Scala. There are examples in Mahout
>>> that
>>>>> take PairRDDs as input but anything that constructs an IndexedDataset
>>> would
>>>>> be fine. I use this code in a system that creates an RDD from HBase.
>>> Think
>>>>> of the task as one of how to create a Spark RDD from your DB content.
>>>>> 
>>>>> On May 3, 2016, at 4:32 AM, Rohit Jain <rohitkjain90@gmail.com>
>>> wrote:
>>>>> 
>>>>> Hello Everyone,
>>>>> I have products and there are certain associated tags to each
>>> product. So
>>>>> to find similar products I am using mahout spark-rowsimilarity
>>> algorithm in
>>>>> following manner.
>>>>> 
>>>>> $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs://
>>> 0.0.0.0:9000/wtrousers
>>>>> -o
>>>>> hdfs://0.0.0.0:9000/s_trousers_out1/ -D:spark.io.compression.=lzf -ma
>>>>> spark://0.0.0.0:7077
>>>>> To run this command I need to pull data from database to flat file. Is
>>>>> there anyway I can use this command / write java code  directly to
>>> work on
>>>>> database?
>>>>> 
>>>>> --
>>>>> Thanks & Regards,
>>>>> 
>>>>> *Rohit Jain*
>>>>> Web developer | Consultant
>>>>> Mob +91 8097283931
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks & Regards,
>>>> 
>>>> *Rohit Jain*
>>>> Web developer | Consultant
>>>> Mob +91 8097283931
>>> 
>>> 
>> 
>> 
>> --
>> Thanks & Regards,
>> 
>> *Rohit Jain*
>> Web developer | Consultant
>> Mob +91 8097283931
>> 
> 
> 
> 
> --
> Thanks & Regards,
> 
> *Rohit Jain*
> Web developer | Consultant
> Mob +91 8097283931
> 



-- 
Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931


Mime
View raw message