nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricky Saltzer <ri...@cloudera.com>
Subject Re: Load distribution in cluster mode
Date Fri, 06 Feb 2015 20:05:03 GMT
Mark -

Thanks for the fast reply, much appreciated. This is what I figured, but
since I was already in clustered mode, I wanted to make sure there wasn't
an easier way than adding each node as a remote process group.

Is there already a JIRA to track the ability to auto distribute in
clustered mode, or would you like me to open it up?

Thanks again,
Ricky

On Fri, Feb 6, 2015 at 2:58 PM, Mark Payne <markap14@hotmail.com> wrote:

> Ricky,
>
>
> The DistributeLoad processor is simply used to route to one of many
> relationships. So if you have, for instance, 5 different servers that you
> can FTP files to, you can use DistributeLoad to round robin the files
> between them, so that you end up pushing 20% to each of 5 PutFTP processors.
>
>
> What you’re wanting to do, it sounds like, is to distribute the FlowFiles
> to different nodes in the cluster. The Remote Process Group is how you
> would need to do that at this time. We have discussed having the ability to
> mark a Connection as “Auto-Distributed” (or maybe some better name 😊) and
> have that automatically distribute the data between nodes in the cluster,
> but that feature hasn’t yet been implemented.
>
>
> Does that answer your question?
>
>
> Thanks
>
> -Mark
>
>
>
>
>
>
> From: Ricky Saltzer
> Sent: ‎Friday‎, ‎February‎ ‎6‎, ‎2015 ‎2‎:‎56‎ ‎PM
> To: dev@nifi.incubator.apache.org
>
>
>
>
>
> Hi -
>
> I have a question regarding load distribution in a clustered NiFi
> environment. I have a really simple example, I'm using the GenerateFlowFile
> processor to generate some random data, then I MD5 hash the file and print
> out the resulting hash.
>
> I want only the primary node to generate the data, but I want both nodes in
> the cluster to share the hashing workload. It appears if I set the
> scheduling strategy to "On primary node" for the GenerateFlowFile
> processor, then the next processor (HashContent) is only being accepted and
> processed by a single node.
>
> I've put DistributeLoad processor in-between the HashContent and
> GenerateFlowFile, but this requires me to use the remote process group to
> distribute the load, which doesn't seem intuitive when I'm already
> clustered.
>
> I guess my question is, is it possible for the DistributeLoad processor to
> understand that NiFi is in a clustered environment, and have an ability to
> distribute the next processor (HashContent) amongst all nodes in the
> cluster?
>
> Cheers,
> --
> Ricky Saltzer
> http://www.cloudera.com
>



-- 
Ricky Saltzer
http://www.cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message