spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rabin Banerjee <dev.rabin.baner...@gmail.com>
Subject Re: Storm HDFS bolt equivalent in Spark Streaming.
Date Wed, 20 Jul 2016 11:06:07 GMT
++Deepak,

There is also a option to use saveAsHadoopFile & saveAsNewAPIHadoopFile, In
which you can customize(filename and many things ...) the way you want to
save it. :)

Happy Sparking !!!!

Regards,
Rabin Banerjee

On Wed, Jul 20, 2016 at 10:01 AM, Deepak Sharma <deepakmca05@gmail.com>
wrote:

> In spark streaming , you have to decide the duration of micro batches to
> run.
> Once you get the micro batch , transform it as per your logic and then you
> can use saveAsTextFiles on your final RDD to write it to HDFS.
>
> Thanks
> Deepak
>
> On 20 Jul 2016 9:49 am, <Rajesh_Kalluri@dellteam.com> wrote:
>
> *Dell - Internal Use - Confidential *
>
> *Dell - Internal Use - Confidential *
>
> While writing to Kafka from Storm, the hdfs bolt provides a nice way to
> batch the messages , rotate files, file name convention etc as shown below.
>
>
>
> Do you know of something similar in Spark Streaming or do we have to roll
> our own? If anyone attempted this can you throw some pointers.
>
>
>
> Every other streaming solution like Flume and NIFI handle logic like below.
>
>
>
>
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_storm-user-guide/content/writing-data-with-storm-hdfs-connector.html
>
>
>
> // use "|" instead of "," for field delimiter
>
> RecordFormat format = new DelimitedRecordFormat()
>
>         .withFieldDelimiter("|");
>
>
>
> // Synchronize the filesystem after every 1000 tuples
>
> SyncPolicy syncPolicy = new CountSyncPolicy(1000);
>
>
>
> // Rotate data files when they reach 5 MB
>
> FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f,
> Units.MB);
>
>
>
> // Use default, Storm-generated file names
>
> FileNameFormat fileNameFormat = new DefaultFileNameFormat()
>
>         .withPath("/foo/");
>
>
>
>
>
> // Instantiate the HdfsBolt
>
> HdfsBolt bolt = new HdfsBolt()
>
>         .withFsUrl("hdfs://localhost:8020")
>
>         .withFileNameFormat(fileNameFormat)
>
>         .withRecordFormat(format)
>
>         .withRotationPolicy(rotationPolicy)
>
>         .withSyncPolicy(syncPolicy);
>
>
>
>
>
>
>

Mime
View raw message