nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <ozhurakou...@hortonworks.com>
Subject Re: Ingest Original data from External system by data's dependent condition
Date Tue, 13 Oct 2015 15:08:23 GMT
Great points Joe!

One point I want to add to the discussion. . .

As I am still learning the internals of the NiFi, the use case at the core of this thread
is actually a very common EIP problem and while Aggregator (Merger) receiving from multiple
inbound sources is one approach, it is not the only one.
Another pattern that would probably fit better here is the ClaimCheck in combination with
MessageStore.
The way it would work is like this:
- Original FlowFile (Message) is stored in MessageStore with the given key (ClaimCheck) which
becomes an attribute to be passed downstream
- Somewhere downstream whenever you ready for aggregation, use the ClaimCheck to access MessageStore
to get the original Message to perform aggregation or whatever else.

The general benefit is that accessing the original message may be required not only for aggregation
but for any variety of use cases. Having ClaimCheck will give access to the original message
to anyone who has it.

So, I wan to use this as an opportunity to ask a wider NiFi group (since I am still learning
it myself) if such pattern is supported? I know there is a ContentRepository so I am assuming
it would’t be that difficult

Cheers
Oleg

> On Oct 13, 2015, at 10:56 AM, Joe Witt <joe.witt@gmail.com> wrote:
> 
> Lot of details passing by here but...
> 
> Given formats A,B...Z coming in the following capabilities are
> generally desired:
> 1) Extract attributes of each event
> 2) Make routing decisions on each event based on those extracted attributes
> 3) Deliver raw/unmodified data to some endpoint (like HDFS)
> 4) Convert/Transform data to some normalized format (and possibly schema too).
> 5) Deliver converted data to some endpoint.
> 
> Steps #1 and #4 involve (naturally) custom work for formats that are
> not something we can readily support out of the box such as XML, JSON,
> AVRO, etc...  Even the workaround suggested really only works for the
> case where you know the original format well enough and we can support
> it which means we'd like not have needed the workaround anyway.  So,
> the issue remains that custom work is required for #1 and #4 cases...
> Now, if you have packed formats that you think we could support please
> let us know and we can see about some mechanism of dealing with those
> formats generically - would be a power user tool of course but
> avoiding custom work is great when achievable with the right user
> experience/capability mix.
> 
> Thanks
> Joe
> 
> On Tue, Oct 13, 2015 at 10:06 AM, yejug <msparysh@gmail.com> wrote:
>> Ok,
>> 
>> Thank you guys for assistance.
>> 
>> Looks like Joe's suggestion more appropriate for me, but there is one BUT,
>> in case 'ExtractXYZAttributes' we must implement implicit parsing of encoded
>> message and cannot reuse this logic, e.g. if we will want do actual  XXX ->
>> Json (for example json =)) convertion in future.
>> 
>> With 99,9% in my case, except AVRO there will be more inputs (as minimum
>> msgpack and some custom binary formats), which must be parsed as well as
>> stored in the original input format
>> 
>> So I think, except ConvertXXXToJson + Andrew's workaround there no more
>> alternatives for me now
>> 
>> Thanks again
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Ingest-Original-data-from-External-system-by-data-s-dependent-condition-tp3093p3101.html
>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
> 

Mime
View raw message