spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Rajesh_Kall...@DellTeam.com>
Subject Storm HDFS bolt equivalent in Spark Streaming.
Date Wed, 20 Jul 2016 16:40:44 GMT
Dell - Internal Use - Confidential
While writing to Kafka from Storm, the hdfs bolt provides a nice way to batch the messages
, rotate files, file name convention etc as shown below.

Do you know of something similar in Spark Streaming or do we have to roll our own? If anyone
attempted this can you throw some pointers.

Every other streaming solution like Flume and NIFI handle logic like below.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_storm-user-guide/content/writing-data-with-storm-hdfs-connector.html

// use "|" instead of "," for field delimiter
RecordFormat format = new DelimitedRecordFormat()
        .withFieldDelimiter("|");

// Synchronize the filesystem after every 1000 tuples
SyncPolicy syncPolicy = new CountSyncPolicy(1000);

// Rotate data files when they reach 5 MB
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB);

// Use default, Storm-generated file names
FileNameFormat fileNameFormat = new DefaultFileNameFormat()
        .withPath("/foo/");


// Instantiate the HdfsBolt
HdfsBolt bolt = new HdfsBolt()
        .withFsUrl("hdfs://localhost:8020")
        .withFileNameFormat(fileNameFormat)
        .withRecordFormat(format)
        .withRotationPolicy(rotationPolicy)
        .withSyncPolicy(syncPolicy);



Mime
View raw message