nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: DistributedMapCacheServer questions
Date Thu, 29 Nov 2018 19:58:14 GMT
thanks guys

here is new Jira as requested
https://issues.apache.org/jira/browse/NIFI-5853



On Thu, Nov 29, 2018 at 2:06 PM Otto Fowler <ottobackwards@gmail.com> wrote:

> Maybe you can open a jira for a ZK client like brian mentions?
>
>
> On November 29, 2018 at 13:59:36, Boris Tyukin (boris@boristyukin.com)
> wrote:
>
> thanks, already looked at state manager but unfortunately need to share
> some values between processors in my case.
>
> I am also researching another which is to use our internal MySQL database.
> I was thinking to create an indexed table and a few simple groovy
> processors around it to put/get/remove values. That database is already set
> up for online replication to another MySQL instance and we can set it up
> for HA easily. I know it sounds like more work than just using NiFi
> distributed cache and I am not sure if MySQL will handle 1000 requests per
> second (even though they will be against a tiny table). But HA setup would
> be nice for us and since "distributed" cache is not really distributed, I
> am not sure I like it.
>
> ZK is an option as well I think since we already have it (for NiFi, Kafka
> and HDFS). Looks like I can create some simple groovy processors to use ZK
> API. I do not expect a lot of put/get operations - maybe about 1000 per
> second max and based on benchmarks I've seen ZK should be able to handle
> this.
>
> I've looked at Redis as well and it is awesome but we are not excited to
> add another system to maintain - we already have quite a few to keep our
> admins busy :)
>
> At least I have choices... :)
>
> Thanks again for your help!
>
> On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende <bbende@gmail.com> wrote:
>
>> I also meant to add that NiFi does provide a "state manager" API to
>> processors, which when clustered will use ZooKeeper.
>>
>> The difference between this and DMC, is that the state for a processor
>> is only accessible to the given processor (or all the instances of the
>> processor across the cluster). It is stored by the processor's UUID.
>>
>> So if the state doesn't need to be shared across different parts of
>> the flow, then you can use this instead. You can look at
>> ProcesContext.getStateManager()
>>
>> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin <boris@boristyukin.com>
>> wrote:
>> >
>> > thanks for the explanation, Bryan! it helps!
>> >
>> > Boris
>> >
>> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende <bbende@gmail.com> wrote:
>> >>
>> >> Boris,
>> >>
>> >> Yes the "distributed" name is confusing... it is referring to the fact
>> >> that it is a cache that can be accessed across the cluster, rather
>> >> than a local cache on each node, but you are correct that that DMC
>> >> server is a single point of failure.
>> >>
>> >> It is important to separate the DMC client and server, there are
>> >> multiple implementations of the DMC client that can interact with
>> >> different caches (Redis, HBase, etc), the trade-off being you then
>> >> have to run/maintain these external systems, instead of the DMC server
>> >> which is fully managed by NiFi.
>> >>
>> >> Regarding ZK... I don't think there is a good answer other than the
>> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't
>> >> start using ZK for clustering until the 1.0.0 release, so originally
>> >> ZK wasn't in the picture. I assume we could implement a DMC client
>> >> that talked to ZK, just like we have done for Redis, HBase, and
>> >> others.
>> >>
>> >> I'm not aware of any issues with the DMC server persisting to file
>> >> system or handling concurrent connections, it should be stable.
>> >>
>> >> Thanks,
>> >>
>> >> Bryan
>> >>
>> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin <boris@boristyukin.com>
>> wrote:
>> >> >
>> >> > Hi guys,
>> >> >
>> >> > I have a few questions about DistributedMapCacheServer.
>> >> >
>> >> > First question, I am confused by "Distributed" part. If I get it,
>> the server actually runs on a single node and if it fails, it is game over.
>> Is that right? Why NiFi is not using ZK for that since ZK is already used
>> by NiFi cluster? I see most of the use cases / examples are about using
>> DistributedMapCacheServer as a lookup or state store and this is exactly
>> what ZK was designed for and provides redundancy, scalability and 5-10k ops
>> per sec on 3 node ZK cluster.
>> >> >
>> >> > Second question, I did not find any tools to interact with it other
>> than Matt's groovy tool.
>> >> >
>> >> > Third question, how DistributedMapCacheServer that persists to file
>> system, handles concurrency and locking? Is it reliable and can be trusted?
>> >> >
>> >> > And lastly, is there additional overhead to support
>> DistributedMapCacheServer as another system or it is pretty much hands off
>> once a controller is set up?
>> >> >
>> >> > Thanks!
>> >> > Boris
>>
>

Mime
View raw message