nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <>
Subject Re: DistributedMapCacheServer questions
Date Thu, 29 Nov 2018 18:58:50 GMT
thanks, already looked at state manager but unfortunately need to share
some values between processors in my case.

I am also researching another which is to use our internal MySQL database.
I was thinking to create an indexed table and a few simple groovy
processors around it to put/get/remove values. That database is already set
up for online replication to another MySQL instance and we can set it up
for HA easily. I know it sounds like more work than just using NiFi
distributed cache and I am not sure if MySQL will handle 1000 requests per
second (even though they will be against a tiny table). But HA setup would
be nice for us and since "distributed" cache is not really distributed, I
am not sure I like it.

ZK is an option as well I think since we already have it (for NiFi, Kafka
and HDFS). Looks like I can create some simple groovy processors to use ZK
API. I do not expect a lot of put/get operations - maybe about 1000 per
second max and based on benchmarks I've seen ZK should be able to handle

I've looked at Redis as well and it is awesome but we are not excited to
add another system to maintain - we already have quite a few to keep our
admins busy :)

At least I have choices... :)

Thanks again for your help!

On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende <> wrote:

> I also meant to add that NiFi does provide a "state manager" API to
> processors, which when clustered will use ZooKeeper.
> The difference between this and DMC, is that the state for a processor
> is only accessible to the given processor (or all the instances of the
> processor across the cluster). It is stored by the processor's UUID.
> So if the state doesn't need to be shared across different parts of
> the flow, then you can use this instead. You can look at
> ProcesContext.getStateManager()
> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin <>
> wrote:
> >
> > thanks for the explanation, Bryan! it helps!
> >
> > Boris
> >
> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende <> wrote:
> >>
> >> Boris,
> >>
> >> Yes the "distributed" name is confusing... it is referring to the fact
> >> that it is a cache that can be accessed across the cluster, rather
> >> than a local cache on each node, but you are correct that that DMC
> >> server is a single point of failure.
> >>
> >> It is important to separate the DMC client and server, there are
> >> multiple implementations of the DMC client that can interact with
> >> different caches (Redis, HBase, etc), the trade-off being you then
> >> have to run/maintain these external systems, instead of the DMC server
> >> which is fully managed by NiFi.
> >>
> >> Regarding ZK... I don't think there is a good answer other than the
> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't
> >> start using ZK for clustering until the 1.0.0 release, so originally
> >> ZK wasn't in the picture. I assume we could implement a DMC client
> >> that talked to ZK, just like we have done for Redis, HBase, and
> >> others.
> >>
> >> I'm not aware of any issues with the DMC server persisting to file
> >> system or handling concurrent connections, it should be stable.
> >>
> >> Thanks,
> >>
> >> Bryan
> >>
> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin <>
> wrote:
> >> >
> >> > Hi guys,
> >> >
> >> > I have a few questions about DistributedMapCacheServer.
> >> >
> >> > First question, I am confused by "Distributed" part. If I get it, the
> server actually runs on a single node and if it fails, it is game over. Is
> that right? Why NiFi is not using ZK for that since ZK is already used by
> NiFi cluster? I see most of the use cases / examples are about using
> DistributedMapCacheServer as a lookup or state store and this is exactly
> what ZK was designed for and provides redundancy, scalability and 5-10k ops
> per sec on 3 node ZK cluster.
> >> >
> >> > Second question, I did not find any tools to interact with it other
> than Matt's groovy tool.
> >> >
> >> > Third question, how DistributedMapCacheServer that persists to file
> system, handles concurrency and locking? Is it reliable and can be trusted?
> >> >
> >> > And lastly, is there additional overhead to support
> DistributedMapCacheServer as another system or it is pretty much hands off
> once a controller is set up?
> >> >
> >> > Thanks!
> >> > Boris

View raw message