spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Spark 1.6 Streaming with Checkpointing
Date Fri, 26 Aug 2016 21:11:38 GMT
On Fri, Aug 26, 2016 at 10:54 PM, Benjamin Kim <bbuild11@gmail.com> wrote:

>         // Create a text file stream on an S3 bucket
>         val csv = ssc.textFileStream("s3a://" + awsS3BucketName + "/")
>
>         csv.foreachRDD(rdd => {
>             if (!rdd.partitions.isEmpty) {
>                 // process data
>             }
>         })

Hi Benjamin,

I hardly remember the case now but I'd recommend to move the above
snippet *after* you getOrCreate context, i.e.

>         // Get streaming context from checkpoint data or create a new one
>         val context = StreamingContext.getOrCreate(checkpoint,
>             () => createContext(interval, checkpoint, bucket, database, table, partitionBy))

Please the first snippet here so you only create the pipeline after
you get context.

I might be mistaken, too. Sorry.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message