spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Berman <igor.ber...@gmail.com>
Subject Re: union and reduceByKey wrong shuffle?
Date Fri, 05 Jun 2015 13:23:17 GMT
this jira seems to be connected to our issue
https://issues.apache.org/jira/browse/SPARK-1018

On 2 June 2015 at 19:54, Josh Rosen <rosenville@gmail.com> wrote:

> Ah, interesting.  While working on my new Tungsten shuffle manager, I came
> up with some nice testing interfaces for allowing me to manually trigger
> spills in order to deterministically test those code paths without
> requiring large amounts of data to be shuffled.  Maybe I could make similar
> test interface changes to the existing shuffle code, which might make it
> easier to reproduce this in an isolated environment.
>
> On Mon, Jun 1, 2015 at 11:41 PM, Igor Berman <igor.berman@gmail.com>
> wrote:
>
>> Hi,
>> small mock data doesn't reproduce the problem. IMHO problem is reproduced
>> when we make shuffle big enough to split data into disk.
>> We will work on it to understand and reproduce the problem(not first
>> priority though...)
>>
>>
>> On 1 June 2015 at 23:02, Josh Rosen <rosenville@gmail.com> wrote:
>>
>>> How much work is to produce a small standalone reproduction?  Can you
>>> create an Avro file with some mock data, maybe 10 or so records, then
>>> reproduce this locally?
>>>
>>> On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman <igor.berman@gmail.com>
>>> wrote:
>>>
>>>> switching to use simple pojos instead of using avro for spark
>>>> serialization solved the problem(I mean reading avro from s3 and than
>>>> mapping each avro object to it's pojo serializable counterpart with same
>>>> fields, pojo is registered withing kryo)
>>>> Any thought where to look for a problem/misconfiguration?
>>>>
>>>> On 31 May 2015 at 22:48, Igor Berman <igor.berman@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>> We are using spark 1.3.1
>>>>> Avro-chill (tomorrow will check if its important) we register avro
>>>>> classes from java
>>>>> Avro 1.7.6
>>>>> On May 31, 2015 22:37, "Josh Rosen" <rosenville@gmail.com> wrote:
>>>>>
>>>>>> Which Spark version are you using?  I'd like to understand whether
>>>>>> this change could be caused by recent Kryo serializer re-use changes
in
>>>>>> master / Spark 1.4.
>>>>>>
>>>>>> On Sun, May 31, 2015 at 11:31 AM, igor.berman <igor.berman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> after investigation the problem is somehow connected to avro
>>>>>>> serialization
>>>>>>> with kryo + chill-avro(mapping avro object to simple scala case
>>>>>>> class and
>>>>>>> running reduce on these case class objects solves the problem)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message