spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yash Sharma <yash...@gmail.com>
Subject Re: Streaming from Kinesis is not getting data in Yarn cluster
Date Fri, 15 Jul 2016 21:14:54 GMT
I struggled with kinesis for a long time and got all my findings documented
at -

http://stackoverflow.com/questions/35567440/spark-not-able-to-fetch-events-from-amazon-kinesis

Let me know if it helps.

Cheers,
Yash

- Thanks, via mobile,  excuse brevity.

On Jul 16, 2016 6:05 AM, "dharmendra" <d17may@gmail.com> wrote:

> I have created small spark streaming program to fetch data from Kinesis and
> put some data in database.
> When i ran it in spark standalone cluster using master as local[*] it is
> working fine but when i tried to run in yarn cluster with master as "yarn"
> application doesn't receive any data.
>
> I submit job using following command
> spark-submit --class <<className>> --master yarn --deploy-mode cluster
> --queue default --executor-cores 2 --executor-memory 2G --num-executors 4
> <<jar>
>
> My java code is like
>
> JavaDStream<Aggregation> enrichStream =
> javaDStream.flatMap(sparkRecordProcessor);
>
> enrichStream.mapToPair(new PairFunction<Aggregation, Aggregation,
> Integer>()
> {
>         @Override
>         public Tuple2<Aggregation, Integer> call(Aggregation s) throws
> Exception {
>                 LOGGER.info("creating tuple " + s);
>                 return new Tuple2<>(s, 1);
>         }
> }).reduceByKey(new Function2<Integer, Integer, Integer>() {
>         @Override
>         public Integer call(Integer i1, Integer i2) throws Exception {
>                 LOGGER.info("reduce by key {}, {}", i1, i2);
>                 return i1 + i2;
>         }
> }).foreach(sparkDatabaseProcessor);
>
>
> I have put some logs in sparkRecordProcessor and sparkDatabaseProcessor.
> I can see that sparkDatabaseProcessor executed every batch interval(10 sec)
> and but find no log in sparkRecordProcessor.
> There is no event(avg/sec) in Spark Streaming UI.
> In Executor tab i can see 3 executors. Data against these executors are
> also
> continuously updated.
> I also check Dynamodb table in Amazon and leaseCounter is updated regularly
> from my application.
> But spark streaming gets no data from Kinesis in yarn.
> I see "shuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0
> blocks" many times in log.
> I don't know what else i need to do to run spark streaming on yarn.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-from-Kinesis-is-not-getting-data-in-Yarn-cluster-tp27345.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message