spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Srivastava <>
Subject Join : Giving incorrect result
Date Wed, 04 Jun 2014 12:32:32 GMT

I am doing join of two RDDs which giving different results ( counting number of records )
each time I run this code on same input.

The input files are large enough to be divided in two splits. When the program runs on two
workers with single core assigned to these, output is consistent and looks correct. But when
single worker is used with two or more than two cores, the result seems to be random. Every
time, count of joined record is different.

Does this sound like a defect or I need to take care of something while using join ? I am
using spark-0.9.1.


View raw message