spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dharmendra <>
Subject Streaming from Kinesis is not getting data in Yarn cluster
Date Fri, 15 Jul 2016 20:04:54 GMT
I have created small spark streaming program to fetch data from Kinesis and
put some data in database.
When i ran it in spark standalone cluster using master as local[*] it is
working fine but when i tried to run in yarn cluster with master as "yarn"
application doesn't receive any data.

I submit job using following command
spark-submit --class <<className>> --master yarn --deploy-mode cluster
--queue default --executor-cores 2 --executor-memory 2G --num-executors 4

My java code is like 

JavaDStream<Aggregation> enrichStream =

enrichStream.mapToPair(new PairFunction<Aggregation, Aggregation, Integer>()
	public Tuple2<Aggregation, Integer> call(Aggregation s) throws Exception {"creating tuple " + s);
		return new Tuple2<>(s, 1);
}).reduceByKey(new Function2<Integer, Integer, Integer>() {
	public Integer call(Integer i1, Integer i2) throws Exception {"reduce by key {}, {}", i1, i2);
		return i1 + i2;

I have put some logs in sparkRecordProcessor and sparkDatabaseProcessor.
I can see that sparkDatabaseProcessor executed every batch interval(10 sec)
and but find no log in sparkRecordProcessor.
There is no event(avg/sec) in Spark Streaming UI.
In Executor tab i can see 3 executors. Data against these executors are also
continuously updated.
I also check Dynamodb table in Amazon and leaseCounter is updated regularly
from my application.
But spark streaming gets no data from Kinesis in yarn.
I see "shuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0
blocks" many times in log.
I don't know what else i need to do to run spark streaming on yarn.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message