spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: how to maintain the offset for spark streaming if HDFS is the source
Date Tue, 16 Jun 2015 13:11:28 GMT
With sparkstreaming when you use fileStream or textFileStream it will
always pick up the files from the directory whose timestamp is > the
current timestamp, and if you have checkpointing enabled then it would
start from the last read timestamp. So you may not need to maintain the
line number.

Thanks
Best Regards

On Tue, Jun 16, 2015 at 5:55 PM, Manohar753 <manohar.reddy@happiestminds.com
> wrote:

> Hi All,
> In my usecase  HDFS  file as  source for Spark Stream,
> the job will process the data line by line but how will make sure to
> maintain the offset line number(data already processed) while
> restarting/new
> code push .
>
> Team can you please reply on this is there any configuration in Spark.
>
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-maintain-the-offset-for-spark-streaming-if-HDFS-is-the-source-tp23336.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message