nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuri Nikonovich <utagai...@gmail.com>
Subject Re: How to configure site-to-site communication between nodes in one cluster.
Date Wed, 01 Jun 2016 16:37:49 GMT
Hello, Bryan
Thanks for the answer.
You've understood me correctly. What I'm trying to achieve is to put some
validation on the dataset. So I fetch all data with one query from db(I
can't change this behavior), then I use SplitAvro processor to split it
into chunks say 1000 records each. After that I want to treat each chunk
independently, transform each record in a chunk according to my domain
model, validate it and save. This transform-load work I want to distribute
across the cluster.

While reading about Nifi I've haven't found any information about flows
like mine. This fact worries me a little. Maybe I'm trying to do something
that is not suitable for Nifi.

Is Nifi a suitable tool for processing large files or I should not do
actual processing work outside the Nifi flow?

2016-06-01 17:28 GMT+03:00 Bryan Bende <bbende@gmail.com>:

> Hello,
>
> This post [1] has a description of how to redistribute data with in the
> same cluster. You are correct that it involves a RPG pointing back to the
> same cluster.
>
> One thing to keep in mind is that typically we do this with a List + Fetch
> pattern, where the List operation produces lightweight results like the
> list of filenames to fetch, then redistributes those results and the
> fetching happens in parallel.
> In your case, if i understand it correctly, you will have already fetched
> the data on the first node, and then have to transfer the actual data to
> the cluster nodes which could have some overhead.
>
> It might require a custom processor to do this, but you might want to
> consider somehow determining what needs to be fetched after receiving the
> HTTP request, and redistributing that so each node can then fetch from the
> DB in parallel.
>
> Let me know if this doesn't make sense.
>
> -Bryan
>
> [1]
> https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
>
>
> On Wed, Jun 1, 2016 at 6:06 AM, Yuri Nikonovich <utagai.by@gmail.com>
> wrote:
>
>> Hi
>> I have the following flow:
>> Receive HTTP request -> Fetch data from db -> split it in chunks of fixed
>> size -> process each chunk and save it to Cassandra.
>>
>> I've built a flow and it works perfectly on non-clustered setup. But when
>> I configured clustered setup
>> I found out that all heavy work is done only on one node. So if the flow
>> has started on node1 it will run to the end on node1. What I want to
>> achieve is to spread data chunks fetched from DB across the cluster in
>> order to process them in parallel, but it looks like Nifi doesn't send flow
>> files between nodes in a cluster.
>> As far as I understand, in order to make node send data to another node I
>> should create a remote process group and send data to this RPG. All
>> examples I could find on Internet describe RPGs as cluster-to-cluster
>> communication or remote node-to-cluster communication. So for my case, I
>> assume, have to create RPG pointing to the same cluster. Could you please
>> provide me a guide how to do this.
>>
>>
>> --
>> Regards,
>> Nikanovich Yury
>>
>
>


-- 
С уважением,
Юрий Никонович

Mime
View raw message