spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Morin <morin.david....@gmail.com>
Subject Re: Spark 3.0.1 Structured streaming - checkpoints fail
Date Thu, 24 Dec 2020 01:01:36 GMT
Thanks Jungtaek
Ok I got it. I'll test it and check if the loss of efficiency is acceptable.


Le mer. 23 déc. 2020 à 23:29, Jungtaek Lim <kabhwan.opensource@gmail.com> a
écrit :

> Please refer my previous answer -
> https://lists.apache.org/thread.html/r7dfc9e47cd9651fb974f97dde756013fd0b90e49d4f6382d7a3d68f7%40%3Cuser.spark.apache.org%3E
> Probably we may want to add it in the SS guide doc. We didn't need it as
> it just didn't work with eventually consistent model, and now it works
> anyway but is very inefficient.
>
>
> On Thu, Dec 24, 2020 at 6:16 AM David Morin <morin.david.bzh@gmail.com>
> wrote:
>
>> Does it work with the standard AWS S3 solution and its new
>> consistency model
>> <https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/>
>> ?
>>
>> Le mer. 23 déc. 2020 à 18:48, David Morin <morin.david.bzh@gmail.com> a
>> écrit :
>>
>>> Thanks.
>>> My Spark applications run on nodes based on docker images but this is a
>>> standalone mode (1 driver - n workers)
>>> Can we use S3 directly with consistency addon like s3guard (s3a) or AWS
>>> Consistent view
>>> <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html>
>>>  ?
>>>
>>> Le mer. 23 déc. 2020 à 17:48, Lalwani, Jayesh <jlalwani@amazon.com> a
>>> écrit :
>>>
>>>> Yes. It is necessary to have a distributed file system because all the
>>>> workers need to read/write to the checkpoint. The distributed file system
>>>> has to be immediately consistent: When one node writes to it, the other
>>>> nodes should be able to read it immediately
>>>>
>>>> The solutions/workarounds depend on where you are hosting your Spark
>>>> application.
>>>>
>>>>
>>>>
>>>> *From: *David Morin <morin.david.bzh@gmail.com>
>>>> *Date: *Wednesday, December 23, 2020 at 11:08 AM
>>>> *To: *"user@spark.apache.org" <user@spark.apache.org>
>>>> *Subject: *[EXTERNAL] Spark 3.0.1 Structured streaming - checkpoints
>>>> fail
>>>>
>>>>
>>>>
>>>> *CAUTION*: This email originated from outside of the organization. Do
>>>> not click links or open attachments unless you can confirm the sender and
>>>> know the content is safe.
>>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>>
>>>> I have an issue with my Pyspark job related to checkpoint.
>>>>
>>>>
>>>>
>>>> Caused by: org.apache.spark.SparkException: Job aborted due to stage
>>>> failure: Task 3 in stage 16997.0 failed 4 times, most recent failure: Lost
>>>> task 3.3 in stage 16997.0 (TID 206609, 10.XXX, executor 4):
>>>> java.lang.IllegalStateException: Error reading delta file
>>>> file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta of
>>>> HDFSStateStoreProvider[id = (op=0,part=3),dir =
>>>> file:/opt/spark/workdir/query6/checkpointlocation/state/0/3]: *file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta
>>>> does not exist*
>>>>
>>>>
>>>>
>>>> This job is based on Spark 3.0.1 and Structured Streaming
>>>>
>>>> This Spark cluster (1 driver and 6 executors) works without hdfs. And
>>>> we don't want to manage an hdfs cluster if possible.
>>>>
>>>> Is it necessary to have a distributed filesystem ? What are the
>>>> different solutions/workarounds ?
>>>>
>>>>
>>>>
>>>> Thanks in advance
>>>>
>>>> David
>>>>
>>>

Mime
View raw message