spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santosh.B" <contactsanto...@gmail.com>
Subject Re: AVRO Append HDFS using saveAsNewAPIHadoopFile
Date Mon, 09 Jan 2017 11:20:31 GMT
Yes it provides but whatever have seen its line by line update. Please see
below link
 https://gist.github.com/QwertyManiac/4724582

This is very slow because of append Avro , am thinking of something  which
we normally do for test files where we buffer the data to a size and the
flush the buffer.





On Mon, Jan 9, 2017 at 3:17 PM, Jörn Franke <jornfranke@gmail.com> wrote:

> Avro itself supports it, but I am not sure if this functionality is
> available through the Spark API. Just out of curiosity, if your use case is
> only write to HDFS then you might use simply flume.
>
> On 9 Jan 2017, at 09:58, awkysam <contactsantoshb@gmail.com> wrote:
>
> Currently for our project we are collecting data and pushing into Kafka
> with messages are in Avro format. We need to push this data into HDFS and
> we are using SparkStreaming and in HDFS also it is stored in Avro format.
> We are partitioning the data per each day. So when we write data into HDFS
> we need to append to the same file. Curenttly we are using
> GenericRecordWriter and we will be using saveAsNewAPIHadoopFile for writing
> into HDFS. Is there a way to append data into file in HDFS with Avro format
> using saveAsNewAPIHadoopFile ? Thanks, Santosh B
> ------------------------------
> View this message in context: AVRO Append HDFS using
> saveAsNewAPIHadoopFile
> <http://apache-spark-user-list.1001560.n3.nabble.com/AVRO-Append-HDFS-using-saveAsNewAPIHadoopFile-tp28292.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>

Mime
View raw message