mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: spark-itemsimilarity out of memory problem
Date Tue, 23 Dec 2014 16:14:28 GMT
Why do you say it will lead to less accuracy?

The weights are LLR weights and they are used to filter and downsample the indicator matrix.
Once the downsampling is done they are not needed. When you index the indicators in a search
engine they will get TF-IDF weights and this is a good effect. It will downweight very popular
items which hold little value as an indicator of user’s taste.

On Dec 23, 2014, at 1:17 AM, hlqv <hlqvuong@gmail.com> wrote:

Hi Pat Ferrel
Use option --omitStrength to output indexable data but this lead to less
accuracy while querying due to omit similar values between items.
Whether can put these values in order to improve accuracy in a search engine

On 23 December 2014 at 02:17, Pat Ferrel <pat@occamsmachete.com> wrote:

> Also Ted has an ebook you can download:
> mapr.com/practical-machine-learning
> 
> On Dec 22, 2014, at 10:52 AM, Pat Ferrel <pat@occamsmachete.com> wrote:
> 
> Hi Hani,
> 
> I recently read about Souq.com. A vey promising project.
> 
> If you are looking at the spark-itemsimilarity for ecommerce type
> recommendations you may be interested in some slide decs and blog posts
> I’ve done on the subject.
> Check out:
> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> 
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> 
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> 
> Also I put up a demo site that uses some of these techniques:
> https://guide.finderbots.com
> 
> Good luck,
> Pat
> 
> On Dec 21, 2014, at 11:44 PM, AlShater, Hani <halshater@souq.com> wrote:
> 
> Hi All,
> 
> I am trying to use spark-itemsimilarity on 160M user interactions dataset.
> The job launches and running successfully for small data 1M action.
> However, when trying for the larger dataset, some spark stages continuously
> fail with out of memory exception.
> 
> I tried to change the spark.storage.memoryFraction from spark default
> configuration, but I face the same issue again. How could I configure spark
> when using spark-itemsimilarity, or how to overcome this out of memory
> issue.
> 
> Can you please advice ?
> 
> Thanks,
> Hani.​​
> ​
> 
> Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/>
> Mob: +962 790471101 | Phone: +962 65821236 | Skype:
> hani.alshater@outlook.com | halshater@souq.com <lghafri@souq.com> |
> www.souq.com
> Nouh Al Romi Street, Building number 8, Amman, Jordan
> 
> --
> 
> 
> *Download free Souq.com <http://souq.com/> mobile apps for iPhone
> <https://itunes.apple.com/us/app/id675000850>, iPad
> <https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android
> <https://play.google.com/store/apps/details?id=com.souq.app> or Windows
> Phone
> <
> http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc>
> **and never
> miss a deal! *
> 
> 
> 


Mime
View raw message