spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 尹绪森 <yinxu...@gmail.com>
Subject Re: Non-deterministic behavior in spark
Date Fri, 24 Jan 2014 13:29:12 GMT
1. Does there any in-place operation in you code? Such as addi() for
DoubleMatrix. This kind of operation will affect the original data.

2. You could try to use Spark replay debugger, there is a assert function.
Hope that helpful.
http://spark-replay-debugger-overview.readthedocs.org/en/latest/


2014/1/24 Ognen Duzlevski <ognen@plainvanillagames.com>

> No. It is a filter that splits a line in a json file and extracts a
> position for it - every run is the same.
>
> That's what bothers me about this.
>
> Ognen
>
>
> On Fri, Jan 24, 2014 at 12:40 PM, 尹绪森 <yinxusen@gmail.com> wrote:
>
>>  Does there are some non-deterministic codes in filter ? Such as
>> Random.nextInt(). If so, the program lost the idempotent feature. You
>> should specify a seed to it.
>>
>>
>> 2014/1/24 Ognen Duzlevski <ognen@nengoiksvelzud.com>
>>
>>> Hello,
>>>
>>> (Sorry for the sensationalist title) :)
>>>
>>> If I run Spark on files from S3 and do basic transformation like:
>>>
>>> textfile()
>>> filter
>>> groupByKey
>>> count
>>>
>>> I get one number (e.g. 40,000).
>>>
>>> If I do the same on the same files from HDFS, the number spat out is
>>> completely different (VERY different - something like 13,000).
>>>
>>> What would one do in a situation like this? How do I even go about
>>> figuring out what the problem is? This is run on a cluster of 15 instances
>>> on Amazon.
>>>
>>> Thanks,
>>> Ognen
>>>
>>
>>
>>
>> --
>> Best Regards
>> -----------------------------------
>> Xusen Yin    尹绪森
>> Beijing Key Laboratory of Intelligent Telecommunications Software and
>> Multimedia
>> Beijing University of Posts & Telecommunications
>> Intel Labs China
>> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
>>
>
>
>
> --
> "Le secret des grandes fortunes sans cause apparente est un crime oublié,
> parce qu'il a été proprement fait" - Honore de Balzac
>



-- 
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and
Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China
Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*

Mime
View raw message