nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: FlattenJson
Date Fri, 23 Mar 2018 15:24:35 GMT
Most of the ideas discussed here are assuming there is one record per
flow file, which for any serious amount of data is not what you want
to do.

It might be better to have a new ExtractJsonToAttributes processor
that enforces limitations like a single json doc per flow file and all
flat fields, so if you don't split and flatten before hand then it
routes to failure.


On Fri, Mar 23, 2018 at 11:07 AM, Jorge Machado <jomach@me.com> wrote:
> So I’m pretty lost now, all the suggestions from Matt will not solve my problem that
I need to have all contents of a flow file as attritube key -paired…
>
> A good place to have it would be on ConvertAvroToJSON so that it has a option to say
if it goes to attribute or to FlowFile, defaulting to Flowfile.
>
> Would be the Changed accepted  ? I would create a PR for it.
>
>
> Jorge Machado
>
>
>
>
>
>> On 20 Mar 2018, at 22:35, Otto Fowler <ottobackwards@gmail.com> wrote:
>>
>> We could start with routeOnJsonPath and do the record path as the need
>> arises?
>>
>>
>> On March 20, 2018 at 16:06:34, Matt Burgess (mattyb149@apache.org) wrote:
>>
>> Rather than restricting it to JSONPath, perhaps we should have a
>> RouteOnRecordPath or RouteRecord using the RecordPath API? Even better
>> would be the ability to use RecordPath functions in QueryRecord, but
>> that involves digging into Calcite as well. I realize JSONPath might
>> have more capabilities than RecordPath at the moment, but it seems a
>> shame to force the user to convert to JSON to use a "RouteOnJSONPath"
>> processor, the record-aware processors are meant to replace that kind
>> of format-specific functionality.
>>
>> Regards,
>> Matt
>>
>> On Tue, Mar 20, 2018 at 12:19 PM, Sivaprasanna
>> <sivaprasanna246@gmail.com> wrote:
>>> Like the idea that Otto suggested. RoutOnJSONPath makes more sense since
>>> making the flattened JSON write to attributes is restricted to that
>>> processor alone.
>>>
>>> On Tue, Mar 20, 2018 at 8:37 PM, Otto Fowler <ottobackwards@gmail.com>
>>> wrote:
>>>
>>>> Why not create a new processor that does routeOnJSONPath and works on
>> the
>>>> flow file?
>>>>
>>>>
>>>> On March 20, 2018 at 10:39:37, Jorge Machado (jomach@me.com) wrote:
>>>>
>>>> So that is what we actually are doing EvaluateJsonPath the problem with
>>>> that is, that is hard to build something generic if we need to specify
>> each
>>>> property by his name, that’s why this idea.
>>>>
>>>> Should I make a PR for this or is this to business specific ?
>>>>
>>>>
>>>> Jorge Machado
>>>>
>>>>> On 20 Mar 2018, at 15:30, Bryan Bende <bbende@gmail.com> wrote:
>>>>>
>>>>> Ok so I guess it depends whether you end up needing all 30 fields as
>>>>> attributes to achieve the logic in your flow, or if you only need a
>>>>> couple.
>>>>>
>>>>> If you only need a couple you could probably use EvaluateJsonPath
>>>>> after FlattenJson to extract just the couple of fields you need into
>>>>> attributes.
>>>>>
>>>>> If you need them all then I guess it makes sense to want the option to
>>>>> flatten into attributes.
>>>>>
>>>>> On Tue, Mar 20, 2018 at 10:14 AM, Jorge Machado <jomach@me.com>
wrote:
>>>>>> From there on we use a lot of routeOnAttritutes and use that values
>> on
>>>> sql queries to other tables like select * from someTable where
>>>> id=${myExtractedAttribute}
>>>>>> To be honest I tryed JoltTransformJSON but I could not get it working
>> :)
>>>>>>
>>>>>> Jorge Machado
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 20 Mar 2018, at 15:12, Matt Burgess <mattyb149@apache.org>
wrote:
>>>>>>>
>>>>>>> I think Bryan is asking about what happens AFTER this part of
the
>>>>>>> flow. For example, if you are doing routing you can use QueryRecord
>>>>>>> (and you won't need the SplitJson), if you are doing transformations
>>>>>>> you can use JoltTransformJSON (often without SplitJson as well),
>> etc.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Matt
>>>>>>>
>>>>>>> On Tue, Mar 20, 2018 at 10:08 AM, Jorge Machado <jomach@me.com>
>> wrote:
>>>>>>>> Hi Bryan,
>>>>>>>>
>>>>>>>> thanks for the help.
>>>>>>>> Our Flow: ExecuteSql -> convertToJSON -> SplitJson
-> ExecuteScript
>>>> with attachedcode 1.
>>>>>>>>
>>>>>>>> We are now writting a custom processor that does this which
is a
>> copy
>>>> of FlattenJson but instead of putting the result into a flowfile we put
>> it
>>>> into the attributes.
>>>>>>>> That’s why I asked if it makes sense to contribute this
back
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Attached code 1:
>>>>>>>>
>>>>>>>> import org.apache.commons.io.IOUtils
>>>>>>>> import java.nio.charset.*
>>>>>>>> def flowFile = session.get();
>>>>>>>> if (flowFile == null) {
>>>>>>>> return;
>>>>>>>> }
>>>>>>>> def slurper = new groovy.json.JsonSlurper()
>>>>>>>> def attrs = [:] as Map<String,String>
>>>>>>>> session.read(flowFile,
>>>>>>>> { inputStream ->
>>>>>>>> def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>>>>>>>> def obj = slurper.parseText(text)
>>>>>>>> obj.each {k,v ->
>>>>>>>> if(v!=null && v.toString()!=""){
>>>>>>>> attrs[k] = v.toString()
>>>>>>>> }
>>>>>>>> }
>>>>>>>> } as InputStreamCallback)
>>>>>>>> flowFile = session.putAllAttributes(flowFile, attrs)
>>>>>>>> session.transfer(flowFile, REL_SUCCESS)
>>>>>>>>
>>>>>>>> some code removed
>>>>>>>>
>>>>>>>>
>>>>>>>> Jorge Machado
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 20 Mar 2018, at 15:03, Bryan Bende <bbende@gmail.com>
wrote:
>>>>>>>>>
>>>>>>>>> Ok it is still not clear what the reason for needing
it in
>> attributes
>>>>>>>>> is though... Is there another processor you are using
after this
>> that
>>>>>>>>> only works off attributes?
>>>>>>>>>
>>>>>>>>> Just trying to understand if there is another way to
accomplish
>> what
>>>>>>>>> you want to do.
>>>>>>>>>
>>>>>>>>> On Tue, Mar 20, 2018 at 9:50 AM, Jorge Machado <jomach@me.com>
>>>> wrote:
>>>>>>>>>> We are using nifi for Workflow and we get from a
database like
>>>> job_status and job_name and some nested json columns. (30 columns)
>>>>>>>>>> We need to put it as attributes from the Flow file
and not the
>>>> content. For the first part (columns without a json is done by groovy
>>>> script) but then would be nice to use this standard processor and
>> instead
>>>> of writing this to a flow content write it to attributes.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Jorge Machado
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 20 Mar 2018, at 14:47, Bryan Bende <bbende@gmail.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>> What would be the main use case for wanting all
the flattened
>>>> values
>>>>>>>>>>> in attributes?
>>>>>>>>>>>
>>>>>>>>>>> If the reason was to keep the original content,
we could
>> probably
>>>> just
>>>>>>>>>>> added an original relationship.
>>>>>>>>>>>
>>>>>>>>>>> Also, I think FlattenJson supports flattening
a flow file where
>> the
>>>>>>>>>>> root is an array of JSON documents (although
I'm not totally
>> sure),
>>>> so
>>>>>>>>>>> you'd have to consider what to do in that case.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 20, 2018 at 5:26 AM, Pierre Villard
>>>>>>>>>>> <pierre.villard.fr@gmail.com> wrote:
>>>>>>>>>>>> No I do see how this could be convenient
in some cases. My
>> comment
>>>> was
>>>>>>>>>>>> more: you can certainly submit a PR for that
feature, but it'll
>>>> need to be
>>>>>>>>>>>> clearly documented using the appropriate
annotations,
>>>> documentation, and
>>>>>>>>>>>> property descriptions.
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-03-20 10:20 GMT+01:00 Jorge Machado
<jomach@me.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Pierre, I’m aware of that. So This
means the change would
>> not
>>>> be
>>>>>>>>>>>>> accepted correct ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jorge Machado
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 20 Mar 2018, at 09:54, Pierre
Villard <
>>>> pierre.villard.fr@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Jorge,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this should be carefully
documented to remind users
>> that
>>>> the
>>>>>>>>>>>>>> attributes are in memory. Doing what
you propose would mean
>>>> having in
>>>>>>>>>>>>>> memory the full content of the flow
file as long as the flow
>>>> file is
>>>>>>>>>>>>>> processed in the workflow (unless
you remove attributes using
>>>>>>>>>>>>>> UpdateAttributes).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pierre
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-03-20 7:55 GMT+01:00 Jorge Machado
<jomach@me.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey guys,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would like to change the FlattenJson
Procerssor to be
>>>> possible to
>>>>>>>>>>>>>>> Flatten to the attributes instead
of Only to content. Is
>> this a
>>>> good
>>>>>>>>>>>>> Idea ?
>>>>>>>>>>>>>>> would the PR be accepted ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jorge Machado
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>

Mime
View raw message