spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: ALS failure with size > Integer.MAX_VALUE
Date Tue, 02 Dec 2014 22:54:41 GMT
Hi Bharath,

You can try setting a small item blocks in this case. 1200 is
definitely too large for ALS. Please try 30 or even smaller. I'm not
sure whether this could solve the problem because you have 100 items
connected with 10^8 users. There is a JIRA for this issue:

https://issues.apache.org/jira/browse/SPARK-3735

which I will try to implement in 1.3. I'll ping you when it is ready.

Best,
Xiangrui

On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <reachbach@gmail.com> wrote:
> Yes, the issue appears to be due to the 2GB block size limitation. I am
> hence looking for (user, product) block sizing suggestions to work around
> the block size limitation.
>
> On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <sowen@cloudera.com> wrote:
>>
>> (It won't be that, since you see that the error occur when reading a
>> block from disk. I think this is an instance of the 2GB block size
>> limitation.)
>>
>> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
>> <Ilya.Ganelin@capitalone.com> wrote:
>> > Hi Bharath – I’m unsure if this is your problem but the
>> > MatrixFactorizationModel in MLLIB which is the underlying component for
>> > ALS
>> > expects your User/Product fields to be integers. Specifically, the input
>> > to
>> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
>> > wondering if
>> > perhaps one of your identifiers exceeds MAX_INT, could you write a quick
>> > check for that?
>> >
>> > I have been running a very similar use case to yours (with more
>> > constrained
>> > hardware resources) and I haven’t seen this exact problem but I’m sure
>> > we’ve
>> > seen similar issues. Please let me know if you have other questions.
>> >
>> > From: Bharath Ravi Kumar <reachbach@gmail.com>
>> > Date: Thursday, November 27, 2014 at 1:30 PM
>> > To: "user@spark.apache.org" <user@spark.apache.org>
>> > Subject: ALS failure with size > Integer.MAX_VALUE
>> >
>> > We're training a recommender with ALS in mllib 1.1 against a dataset of
>> > 150M
>> > users and 4.5K items, with the total number of training records being
>> > 1.2
>> > Billion (~30GB data). The input data is spread across 1200 partitions on
>> > HDFS. For the training, rank=10, and we've configured {number of user
>> > data
>> > blocks = number of item data blocks}. The number of user/item blocks was
>> > varied  between 50 to 1200. Irrespective of the block size (e.g. at 1200
>> > blocks each), there are atleast a couple of tasks that end up shuffle
>> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and failing
>> > with
>> > the following exception:
>> >
>> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
>> >         at
>> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
>> >         at
>> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
>> >         at
>> >
>> > org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
>> >         at
>> >
>> > org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message