mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Jain <rohitkjai...@gmail.com>
Subject Re: Mahout rowSimilarity
Date Wed, 04 May 2016 13:12:22 GMT
I am still looking searching for my answer. It will be great if somebody
can help me with this :)

On Wed, May 4, 2016 at 11:25 AM, Rohit Jain <rohitkjain90@gmail.com> wrote:

> And If yes, can you please help me with what exactly do you mean by "You
> can then just write some simple pre processing code that converts your
> database files to the appropriate format for Mahout and read it in as an
> indexed dataset."
>
> On Wed, May 4, 2016 at 11:21 AM, Rohit Jain <rohitkjain90@gmail.com>
> wrote:
>
>> Hello Nikaash,
>> So you mean I need to first read data from my mogodb using scala's mongo
>> driver and then convert it into indexed datasets. And then process it using
>> row similarity?
>>
>> On Wed, May 4, 2016 at 7:56 AM, Nikaash Puri <nikaashpuri@gmail.com>
>> wrote:
>>
>>> Hi Rohit,
>>>
>>> This would be a good place to start.
>>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
>>> <
>>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
>>> >
>>>
>>> This bit of code, in particular is how to call the spark-rowsimilarity
>>> from Scala:
>>>
>>> val rowSimilarityIDS =
>>> SimilarityAnalysis.rowSimilarityIDS(indexedDataset,…)
>>>
>>> You can then just write some simple pre processing code that converts
>>> your database files to the appropriate format for Mahout and read it in as
>>> an indexed dataset.
>>>
>>> This is another great end to end example that achieves a similar result
>>> using spark-itemsimilarity.
>>> https://mahout.apache.org/users/environment/how-to-build-an-app.html <
>>> https://mahout.apache.org/users/environment/how-to-build-an-app.html>
>>>
>>> Let me know if you need more help.
>>>
>>> Thank you,
>>> Nikaash Puri
>>> > On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjain90@gmail.com> wrote:
>>> >
>>> > Hello Pat,
>>> > Can you please explain it in little detail. I didn't understand how to
>>> go
>>> > about it.
>>> >
>>> > On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel <pat@occamsmachete.com>
>>> wrote:
>>> >
>>> >> Sure, but at least some would be Scala. There are examples in Mahout
>>> that
>>> >> take PairRDDs as input but anything that constructs an IndexedDataset
>>> would
>>> >> be fine. I use this code in a system that creates an RDD from HBase.
>>> Think
>>> >> of the task as one of how to create a Spark RDD from your DB content.
>>> >>
>>> >> On May 3, 2016, at 4:32 AM, Rohit Jain <rohitkjain90@gmail.com>
>>> wrote:
>>> >>
>>> >> Hello Everyone,
>>> >> I have products and there are certain associated tags to each
>>> product. So
>>> >> to find similar products I am using mahout spark-rowsimilarity
>>> algorithm in
>>> >> following manner.
>>> >>
>>> >> $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs://
>>> 0.0.0.0:9000/wtrousers
>>> >> -o
>>> >> hdfs://0.0.0.0:9000/s_trousers_out1/ -D:spark.io.compression.=lzf -ma
>>> >> spark://0.0.0.0:7077
>>> >> To run this command I need to pull data from database to flat file.
Is
>>> >> there anyway I can use this command / write java code  directly to
>>> work on
>>> >> database?
>>> >>
>>> >> --
>>> >> Thanks & Regards,
>>> >>
>>> >> *Rohit Jain*
>>> >> Web developer | Consultant
>>> >> Mob +91 8097283931
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Thanks & Regards,
>>> >
>>> > *Rohit Jain*
>>> > Web developer | Consultant
>>> > Mob +91 8097283931
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> *Rohit Jain*
>> Web developer | Consultant
>> Mob +91 8097283931
>>
>
>
>
> --
> Thanks & Regards,
>
> *Rohit Jain*
> Web developer | Consultant
> Mob +91 8097283931
>



-- 
Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message