spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saisai Shao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-5142) Possibly data may be ruined in Spark Streaming's WAL mechanism.
Date Thu, 08 Jan 2015 10:01:34 GMT
Saisai Shao created SPARK-5142:
----------------------------------

             Summary: Possibly data may be ruined in Spark Streaming's WAL mechanism.
                 Key: SPARK-5142
                 URL: https://issues.apache.org/jira/browse/SPARK-5142
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.2.0
            Reporter: Saisai Shao


Currently in Spark Streaming's WAL manager, data will be written into HDFS with multiple tries
when meeting failure, because of lacking of transactional guarantee, previously partial-written
data is not rolled back and the retried data will be appended to the last, this will ruin
the file and make the WriteAheadLogReader to read data with failure.

Firstly I think this problem is hard to fix because HDFS do not support truncate operation(HDFS-3107)
or random write with specific offset.

Secondly, I think if we meet such write exception, it is better not to try again, try again
will ruin the file and make read abnormal.

Sorry if I misunderstand anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message