From Shashi Vishwakarma <>
Subject Re: Spark Streaming with Nifi
Date Sun, 04 Jun 2017 21:46:39 GMT
Thanks Andrew.

I agree that decoupling component is good solution from long term
perspective. My current data pipeline in Nifi is designed for batch
processing which I am trying to convert into streaming model.

One of the processor in data pipeline invokes Spark job , once job finished
control  is returned to Nifi processor in turn which generates provenance
event for job. This provenance event is important for us.

Keeping batch model architecture in mind, I want to designed spark
streaming based model in which Nifi Spark streaming processor will process
micro batch and job status will be returned to Nifi with provenance event.
Then I can capture that provenance data for my reports.

Essentially I will be using Nifi for capturing provenance event where
actual processing will be done by Spark streaming job.

Do you see this approach logical ?


On Sun, Jun 4, 2017 at 3:10 PM, Andrew Psaltis <>

> Hi Shashi,
> I'm sure there is a way to make this work. However, my first question is
> why you would want to? By design a Spark Streaming application should
> always be running and consuming data from some source, hence the notion of
> streaming. Tying Spark Streaming to NiFi would ultimately result in a more
> coupled and fragile architecture. Perhaps a different way to think about it
> would be to set things up like this:
> NiFi --> Kafka <-- Spark Streaming
> With this you can do what you are doing today -- using NiFi to ingest,
> transform, make routing decisions, and feed data into Kafka. In essence you
> would be using NiFi to do all the preparation of the data for Spark
> Streaming. Kafka would serve the purpose of a buffer between NiFi and Spark
> Streaming. Finally, Spark Streaming would ingest data from Kafka and do
> what it is designed for -- stream processing. Having a decoupled
> architecture like this also allows you to manage each tier separately, thus
> you can tune, scale, develop, and deploy all separately.
> I know I did not directly answer your question on how to make it work.
> But, hopefully this helps provide an approach that will be a better long
> term solution. There may be something I am missing in your initial
> questions.
> Thanks,
> Andrew
> On Sat, Jun 3, 2017 at 10:43 PM, Shashi Vishwakarma <
>> wrote:
>> Hi
>> I am looking for way where I can make use of spark streaming in Nifi. I
>> see couple of post where SiteToSite tcp connection is used for spark
>> streaming application but I thinking it will be good If I can launch Spark
>> streaming from Nifi custom processor.
>> PublishKafka will publish message into Kafka followed by Nifi Spark
>> streaming processor will read from Kafka Topic.
>> I can launch Spark streaming application from custom Nifi processor using
>> Spark Streaming launcher API but biggest challenge is that it will create
>> spark streaming context for each flow file which can be costly operation.
>> Does any one suggest storing spark streaming context  in controller
>> service ? or any better approach for running spark streaming application
>> with Nifi ?
>> Thanks and Regards,
>> Shashi
