spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Spark Streaming Hangs on Start
Date Thu, 09 Jul 2015 19:10:50 GMT
1. There will be a long running job with description "start()" as that is
the jobs that is running the receivers. It will never end.

2. You need to set the number of cores given to the Spark executors by the
YARN container. That is SparkConf spark.executor.cores,  --executor-cores
in spark-submit. Since it is by default 1, your only container has one core
which is occupied by the receiver, leaving no cores to run the map tasks.
So the map stage is blocked

3.  Note these log lines. Especially "15/07/09 18:29:00 INFO
receiver.ReceiverSupervisorImpl: Received stop signal" . I think somehow
your streaming context is being shutdown too early which is causing the
KafkaReceiver to stop. Something your should debug.


15/07/09 18:27:13 INFO consumer.ConsumerFetcherThread:
[ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
Starting
15/07/09 18:27:13 INFO consumer.ConsumerFetcherManager:
[ConsumerFetcherManager-1436437633199] Added fetcher for partitions
ArrayBuffer([[adhoc_data,0], initOffset 53 to broker
id:42,host:szq1.appadhoc.com,port:9092] )
15/07/09 18:27:13 INFO storage.MemoryStore: ensureFreeSpace(1680)
called with curMem=96628, maxMem=16669841817
15/07/09 18:27:13 INFO storage.MemoryStore: Block
input-0-1436437633600 stored as bytes in memory (estimated size 1680.0
B, free 15.5 GB)
15/07/09 18:27:13 WARN storage.BlockManager: Block
input-0-1436437633600 replicated to only 0 peer(s) instead of 1 peers
15/07/09 18:27:14 INFO receiver.BlockGenerator: Pushed block
input-0-1436437633600*15/07/09 18:29:00 INFO
receiver.ReceiverSupervisorImpl: Received stop signal
*15/07/09 18:29:00 INFO receiver.ReceiverSupervisorImpl: Stopping
receiver with message: Stopped by driver:
15/07/09 18:29:00 INFO consumer.ZookeeperConsumerConnector:
[adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201],
ZKConsumerConnector shutting down
15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager:
[ConsumerFetcherManager-1436437633199] Stopping leader finder thread
15/07/09 18:29:00 INFO
consumer.ConsumerFetcherManager$LeaderFinderThread:
[adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread],
Shutting down
15/07/09 18:29:00 INFO
consumer.ConsumerFetcherManager$LeaderFinderThread:
[adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread],
Stopped
15/07/09 18:29:00 INFO
consumer.ConsumerFetcherManager$LeaderFinderThread:
[adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread],
Shutdown completed
15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager:
[ConsumerFetcherManager-1436437633199] Stopping all fetchers
15/07/09 18:29:00 INFO consumer.ConsumerFetcherThread:
[ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
Shutting down
15/07/09 18:29:01 INFO consumer.SimpleConsumer: Reconnect due to
socket error: java.nio.channels.ClosedByInterruptException
15/07/09 18:29:01 INFO consumer.ConsumerFetcherThread:
[ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
Stopped
15/07/09 18:29:01 INFO consumer.ConsumerFetcherThread:
[ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
Shutdown completed
15/07/09 18:29:01 INFO consumer.ConsumerFetcherManager:
[ConsumerFetcherManager-1436437633199] All connections stopped
15/07/09 18:29:01 INFO zkclient.ZkEventThread: Terminate ZkClient event thread.
15/07/09 18:29:01 INFO zookeeper.ZooKeeper: Session: 0x14e70eedca00315 closed
15/07/09 18:29:01 INFO zookeeper.ClientCnxn: EventThread shut down
15/07/09 18:29:01 INFO consumer.ZookeeperConsumerConnector:
[adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201],
ZKConsumerConnector shutdown completed in 74 ms
15/07/09 18:29:01 INFO receiver.ReceiverSupervisorImpl: Called receiver onStop
15/07/09 18:29:01 INFO receiver.ReceiverSupervisorImpl: Deregistering receiver 0

Mime
View raw message