spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ☼ R Nair (रविशंकर नायर) <>
Subject Spark Streaming for more file types
Date Fri, 27 Apr 2018 12:19:43 GMT

I have the following methods in my scala code, currently executed on demand

val files = sc.binaryFiles ("file:///imocks/data/ocr/raw")
//Abive line takes all PDF files

myconverter  signature:

def myconverter (
                    file: (String,
                    ) : Unit  =
//Code to interact with IBM Datamap OCR which converts the PDF files into


I do want to change the above code to Spark streaming. Unfortunately there
is  ( definitely the would be a great addition to Spark) No "binaryFiles"
functions from StreamingContext. The closest I can think of is to write
like this:

//Assuming myconverter is not changed

val dstream = ssc.fileStream[BytesWritable,BytesWritable,
SequenceFileAsBinaryInputFormat]("file:///imocks/data/ocr/raw") ;

Unfortunately everything is in problem now. There are errors showing the
method signature does not match etc etc. Can anyone please help how can I
get out of the issue? Appreciate your help.

Also, won't it be a super excellent idea to have all methods of
SparkContext to be reusable for StreamingContext as well ? In that way, it
takes no extra effort to change a batch program to a streaming app.


View raw message