mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: spark-itemsimilarity out of memory problem
Date Tue, 23 Dec 2014 17:16:08 GMT
Both errors happen when the Spark Context is created using Yarn. I have no experience with
Yarn and so would try it in standalone clustered mode first. Then if all is well check this
page to make sure the Spark cluster is configured correctly for Yarn
https://spark.apache.org/docs/1.1.0/running-on-yarn.html

Are you able to run Spark examples using Yarn? If so maybe some of the Yarn config needs to
be pass into the SparkConf using the -D:key=value

I’m very interested in helping with this, it has to work on Hadoop+Spark+Yarn so if it looks
like a change needs to be made to Mahout, I’ll try to respond quickly.

To use the hadoop mapreduce version (Ted’s suggestion) you’ll loose the cross-cooccurrence
indicators and you’ll have to translate your IDs into Mahout IDs. This means mapping user
and item IDs from your values into non-negative integers representing the row (user) and column
(item) numbers.


BTW: Spark’s maven artifacts were built incorrectly when using Hadoop 1.2.1. This is being
fixed in Spark in a future version and in any case I don’t think it affects hadoop 2.x versions
of the Spark artifacts so you may not need to build Spark 1.1.0

On Dec 23, 2014, at 7:23 AM, AlShater, Hani <halshater@souq.com> wrote:

@Pat, Thanks for your answers. It seems that I have cloned the snapshot
before the feature of configuring spark was added. It worked now in the
local mode. Unfortunately, after trying the new snapshot and spark,
submitting to the cluster in yarn-client mode raise the following error:
Exception in thread "main" java.lang.AbstractMethodError
   at org.apache.spark.Logging$class.log(Logging.scala:52)
   at org.apache.spark.deploy.yarn.Client.log(Client.scala:39)
   at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
   at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:39)
   at
org.apache.spark.deploy.yarn.Client.logClusterResourceDetails(Client.scala:103)
   at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:60)
   at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:81)
   at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
   at org.apache.spark.SparkContext.<init>(SparkContext.scala:323)
   at
org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:95)
   at
org.apache.mahout.drivers.MahoutSparkDriver.start(MahoutSparkDriver.scala:81)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$.start(ItemSimilarityDriver.scala:128)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:211)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
   at scala.Option.map(Option.scala:145)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
   at
org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)

and submitting in yarn-cluster mode raise this error:
Exception in thread "main" org.apache.spark.SparkException: YARN mode not
available ?
   at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1571)
   at org.apache.spark.SparkContext.<init>(SparkContext.scala:310)
   at
org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:95)
   at
org.apache.mahout.drivers.MahoutSparkDriver.start(MahoutSparkDriver.scala:81)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$.start(ItemSimilarityDriver.scala:128)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:211)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
   at scala.Option.map(Option.scala:145)
   at
org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
   at
org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:191)
   at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1566)
   ... 10 more

My cluster consists from 3 nodes, andi using hadoop 2.4.0. I have get spark
1.1.0 and mahout-snapshot, compile, package and install them to the local
maven repo. Am I missing something ?

Thanks again



Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/>
Mob: +962 790471101 | Phone: +962 65821236 | Skype:
hani.alshater@outlook.com | halshater@souq.com <lghafri@souq.com> |
www.souq.com
Nouh Al Romi Street, Building number 8, Amman, Jordan


On Tue, Dec 23, 2014 at 11:17 AM, hlqv <hlqvuong@gmail.com> wrote:

> Hi Pat Ferrel
> Use option --omitStrength to output indexable data but this lead to less
> accuracy while querying due to omit similar values between items.
> Whether can put these values in order to improve accuracy in a search
> engine
> 
> On 23 December 2014 at 02:17, Pat Ferrel <pat@occamsmachete.com> wrote:
> 
>> Also Ted has an ebook you can download:
>> mapr.com/practical-machine-learning
>> 
>> On Dec 22, 2014, at 10:52 AM, Pat Ferrel <pat@occamsmachete.com> wrote:
>> 
>> Hi Hani,
>> 
>> I recently read about Souq.com. A vey promising project.
>> 
>> If you are looking at the spark-itemsimilarity for ecommerce type
>> recommendations you may be interested in some slide decs and blog posts
>> I’ve done on the subject.
>> Check out:
>> 
>> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> 
>> 
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
>> 
>> 
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>> 
>> Also I put up a demo site that uses some of these techniques:
>> https://guide.finderbots.com
>> 
>> Good luck,
>> Pat
>> 
>> On Dec 21, 2014, at 11:44 PM, AlShater, Hani <halshater@souq.com> wrote:
>> 
>> Hi All,
>> 
>> I am trying to use spark-itemsimilarity on 160M user interactions
> dataset.
>> The job launches and running successfully for small data 1M action.
>> However, when trying for the larger dataset, some spark stages
> continuously
>> fail with out of memory exception.
>> 
>> I tried to change the spark.storage.memoryFraction from spark default
>> configuration, but I face the same issue again. How could I configure
> spark
>> when using spark-itemsimilarity, or how to overcome this out of memory
>> issue.
>> 
>> Can you please advice ?
>> 
>> Thanks,
>> Hani.​​
>> ​
>> 
>> Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/>
>> Mob: +962 790471101 | Phone: +962 65821236 | Skype:
>> hani.alshater@outlook.com | halshater@souq.com <lghafri@souq.com> |
>> www.souq.com
>> Nouh Al Romi Street, Building number 8, Amman, Jordan
>> 
>> --
>> 
>> 
>> *Download free Souq.com <http://souq.com/> mobile apps for iPhone
>> <https://itunes.apple.com/us/app/id675000850>, iPad
>> <https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android
>> <https://play.google.com/store/apps/details?id=com.souq.app> or Windows
>> Phone
>> <
>> 
> http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc
>> 
>> **and never
>> miss a deal! *
>> 
>> 
>> 
> 

-- 


*Download free Souq.com <http://souq.com/> mobile apps for iPhone 
<https://itunes.apple.com/us/app/id675000850>, iPad 
<https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android 
<https://play.google.com/store/apps/details?id=com.souq.app> or Windows 
Phone 
<http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc>
**and never 
miss a deal! *


Mime
View raw message