mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AlShater, Hani" <halsha...@souq.com>
Subject Re: spark-itemsimilarity out of memory problem
Date Tue, 23 Dec 2014 15:39:00 GMT
@Pat, I am aware of your blog and of Ted practical machine learning books
and webinars. I have learn a lot
from you guys ;)

@Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and
yarn is configured accordingly. I am trying to avoid spark memory caching.

@Simon, I am using mahout and not spark because I need similarity not
matrix factorization. Actually, the appoach of spark-itemsimilarity is
giving a good way for augmenting content recommendations with collaborative
features. I found their approach more suitable in case of building lambda
architecture supporting recommendations based on content, collaborative
features and recent interactive events in addition to other injected rules.
I think predefined recommendation server cant fit all requirement at once,
for these reasons I am trying to use mahout.



Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/>
Mob: +962 790471101 | Phone: +962 65821236 | Skype:
hani.alshater@outlook.com | halshater@souq.com <lghafri@souq.com> |
www.souq.com
Nouh Al Romi Street, Building number 8, Amman, Jordan


On Tue, Dec 23, 2014 at 5:23 PM, AlShater, Hani <halshater@souq.com> wrote:

> @Pat, Thanks for your answers. It seems that I have cloned the snapshot
> before the feature of configuring spark was added. It worked now in the
> local mode. Unfortunately, after trying the new snapshot and spark,
> submitting to the cluster in yarn-client mode raise the following error:
> Exception in thread "main" java.lang.AbstractMethodError
>     at org.apache.spark.Logging$class.log(Logging.scala:52)
>     at org.apache.spark.deploy.yarn.Client.log(Client.scala:39)
>     at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
>     at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:39)
>     at
> org.apache.spark.deploy.yarn.Client.logClusterResourceDetails(Client.scala:103)
>     at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:60)
>     at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:81)
>     at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:323)
>     at
> org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:95)
>     at
> org.apache.mahout.drivers.MahoutSparkDriver.start(MahoutSparkDriver.scala:81)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$.start(ItemSimilarityDriver.scala:128)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:211)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
>     at scala.Option.map(Option.scala:145)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
>
> and submitting in yarn-cluster mode raise this error:
> Exception in thread "main" org.apache.spark.SparkException: YARN mode not
> available ?
>     at
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1571)
>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:310)
>     at
> org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:95)
>     at
> org.apache.mahout.drivers.MahoutSparkDriver.start(MahoutSparkDriver.scala:81)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$.start(ItemSimilarityDriver.scala:128)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:211)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
>     at scala.Option.map(Option.scala:145)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
>     at
> org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:191)
>     at
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1566)
>     ... 10 more
>
> My cluster consists from 3 nodes, andi using hadoop 2.4.0. I have get
> spark 1.1.0 and mahout-snapshot, compile, package and install them to the
> local maven repo. Am I missing something ?
>
> Thanks again
>
>
>
> Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/>
> Mob: +962 790471101 | Phone: +962 65821236 | Skype:
> hani.alshater@outlook.com | halshater@souq.com <lghafri@souq.com> |
> www.souq.com
> Nouh Al Romi Street, Building number 8, Amman, Jordan
>
>
> On Tue, Dec 23, 2014 at 11:17 AM, hlqv <hlqvuong@gmail.com> wrote:
>
>> Hi Pat Ferrel
>> Use option --omitStrength to output indexable data but this lead to less
>> accuracy while querying due to omit similar values between items.
>> Whether can put these values in order to improve accuracy in a search
>> engine
>>
>> On 23 December 2014 at 02:17, Pat Ferrel <pat@occamsmachete.com> wrote:
>>
>> > Also Ted has an ebook you can download:
>> > mapr.com/practical-machine-learning
>> >
>> > On Dec 22, 2014, at 10:52 AM, Pat Ferrel <pat@occamsmachete.com> wrote:
>> >
>> > Hi Hani,
>> >
>> > I recently read about Souq.com. A vey promising project.
>> >
>> > If you are looking at the spark-itemsimilarity for ecommerce type
>> > recommendations you may be interested in some slide decs and blog posts
>> > I’ve done on the subject.
>> > Check out:
>> >
>> >
>> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> >
>> >
>> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
>> >
>> >
>> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>> >
>> > Also I put up a demo site that uses some of these techniques:
>> > https://guide.finderbots.com
>> >
>> > Good luck,
>> > Pat
>> >
>> > On Dec 21, 2014, at 11:44 PM, AlShater, Hani <halshater@souq.com>
>> wrote:
>> >
>> > Hi All,
>> >
>> > I am trying to use spark-itemsimilarity on 160M user interactions
>> dataset.
>> > The job launches and running successfully for small data 1M action.
>> > However, when trying for the larger dataset, some spark stages
>> continuously
>> > fail with out of memory exception.
>> >
>> > I tried to change the spark.storage.memoryFraction from spark default
>> > configuration, but I face the same issue again. How could I configure
>> spark
>> > when using spark-itemsimilarity, or how to overcome this out of memory
>> > issue.
>> >
>> > Can you please advice ?
>> >
>> > Thanks,
>> > Hani.​​
>> > ​
>> >
>> > Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/>
>> > Mob: +962 790471101 | Phone: +962 65821236 | Skype:
>> > hani.alshater@outlook.com | halshater@souq.com <lghafri@souq.com> |
>> > www.souq.com
>> > Nouh Al Romi Street, Building number 8, Amman, Jordan
>> >
>> > --
>> >
>> >
>> > *Download free Souq.com <http://souq.com/> mobile apps for iPhone
>> > <https://itunes.apple.com/us/app/id675000850>, iPad
>> > <https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android
>> > <https://play.google.com/store/apps/details?id=com.souq.app> or Windows
>> > Phone
>> > <
>> >
>> http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc
>> >
>> > **and never
>> > miss a deal! *
>> >
>> >
>> >
>>
>
>

-- 


*Download free Souq.com <http://souq.com/> mobile apps for iPhone 
<https://itunes.apple.com/us/app/id675000850>, iPad 
<https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android 
<https://play.google.com/store/apps/details?id=com.souq.app> or Windows 
Phone 
<http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc>
**and never 
miss a deal! *

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message