spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Valle <marcelo.va...@ktech.com>
Subject custom rdd - do I need a hadoop input format?
Date Tue, 17 Sep 2019 15:28:46 GMT
Hi,

I want to create a custom RDD which will read n lines in sequence from a
file, which I call a block, and each block should be converted to a spark
dataframe to be processed in parallel.

Question - do I have to implement a custom hadoop input format to achieve
this? Or is it possible to do it only with RDD APIs?

Thanks,
Marcelo.

This email is confidential [and may be protected by legal privilege]. If you are not the intended
recipient, please do not copy or disclose its content but contact the sender immediately upon
receipt.

KTech Services Ltd is registered in England as company number 10704940.

Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, United Kingdom

Mime
View raw message