spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: Spark retrying task indefinietly
Date Mon, 12 Oct 2015 12:21:00 GMT
To answer your question specifically - you can bump the value on spark.streaming.kafka.maxRetries
(see configuration guide: http://spark.apache.org/docs/latest/configuration.html).

That being said, you should avoid this by adding some validation in your deserializaiton /
parse code.

A quick and dirty way to do it is:

	val lines = messages.flatMapValues(v => Try(v.toInt).toOption)


This way, only the lines that are successfully parsed are kept around.
Read a bit on scala.util.{Try, Success, Failure} and Options to understand what’s going
on.

-adrian



On 10/12/15, 9:05 AM, "Amit Singh Hora" <hora.amit@gmail.com> wrote:

>I am running spark locally to understand how countByValueAndWindow works
>
>
>  
>      val Array(brokers, topics) = Array("192.XX.X.XX:9092", "test1")
>  
>      // Create context with 2 second batch interval
>      val sparkConf = new
>SparkConf().setAppName("ReduceByWindowExample").setMaster("local[1,1]")
>      sparkConf.set("spark.task.maxFailures","1")
>      
>      val ssc = new StreamingContext(sparkConf, Seconds(1)) // batch size 1
>      ssc.checkpoint("D:\\SparkCheckPointDirectory")
>      
>      // Create direct kafka stream with brokers and topics
>      val topicsSet = topics.split(",").toSet
>      val kafkaParams = Map[String, String]("metadata.broker.list" ->
>brokers)
>  
>      val messages = KafkaUtils.createDirectStream[String, String,
>StringDecoder, StringDecoder](
>        ssc, kafkaParams, topicsSet)
>  
>      // Get the lines, split them into words, count the words and print
>      val lines = messages.map(_._2.toInt)
>      val keyValuelines = lines.map { x => (x, 1) }
>  
>      val windowedlines=lines.countByValueAndWindow(Seconds(1),Seconds(1))
>      //window,interval
>      //    val windowedlines = lines.reduceByWindow((x, y) => { x + y },
>Seconds(4) , Seconds(2))
>          windowedlines.print()
>  
>      ssc.start()
>      ssc.awaitTermination()
>
>
>everything works file till numeric data is supplied on the kafka topic as I
>am using toInt ,when a blank string "" is written on kafka topic it fails
>complaining NumberFormatExceotion that is OK,but the problem is it is
>retrying this indefinetly again and again and complaining the same
>NumberFormatException Is there any way to control number of time spark will
>try to convert string to Int ,like Spark should try it only [times] and then
>move to next batch of data
>
>Note - I am using Spark 1.4
>    
>
>
>
>--
>View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-retrying-task-indefinietly-tp25022.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>For additional commands, e-mail: user-help@spark.apache.org
>
Mime
View raw message