spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santosh.B" <>
Subject Re: AVRO Append HDFS using saveAsNewAPIHadoopFile
Date Mon, 09 Jan 2017 11:20:31 GMT
Yes it provides but whatever have seen its line by line update. Please see
below link

This is very slow because of append Avro , am thinking of something  which
we normally do for test files where we buffer the data to a size and the
flush the buffer.

On Mon, Jan 9, 2017 at 3:17 PM, Jörn Franke <> wrote:

> Avro itself supports it, but I am not sure if this functionality is
> available through the Spark API. Just out of curiosity, if your use case is
> only write to HDFS then you might use simply flume.
> On 9 Jan 2017, at 09:58, awkysam <> wrote:
> Currently for our project we are collecting data and pushing into Kafka
> with messages are in Avro format. We need to push this data into HDFS and
> we are using SparkStreaming and in HDFS also it is stored in Avro format.
> We are partitioning the data per each day. So when we write data into HDFS
> we need to append to the same file. Curenttly we are using
> GenericRecordWriter and we will be using saveAsNewAPIHadoopFile for writing
> into HDFS. Is there a way to append data into file in HDFS with Avro format
> using saveAsNewAPIHadoopFile ? Thanks, Santosh B
> ------------------------------
> View this message in context: AVRO Append HDFS using
> saveAsNewAPIHadoopFile
> <>
> Sent from the Apache Spark User List mailing list archive
> <> at

View raw message