nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Machado <jom...@me.com>
Subject Re: FlattenJson
Date Fri, 23 Mar 2018 15:07:54 GMT
So I’m pretty lost now, all the suggestions from Matt will not solve my problem that I need
to have all contents of a flow file as attritube key -paired… 

A good place to have it would be on ConvertAvroToJSON so that it has a option to say if it
goes to attribute or to FlowFile, defaulting to Flowfile.

Would be the Changed accepted  ? I would create a PR for it. 


Jorge Machado





> On 20 Mar 2018, at 22:35, Otto Fowler <ottobackwards@gmail.com> wrote:
> 
> We could start with routeOnJsonPath and do the record path as the need
> arises?
> 
> 
> On March 20, 2018 at 16:06:34, Matt Burgess (mattyb149@apache.org) wrote:
> 
> Rather than restricting it to JSONPath, perhaps we should have a
> RouteOnRecordPath or RouteRecord using the RecordPath API? Even better
> would be the ability to use RecordPath functions in QueryRecord, but
> that involves digging into Calcite as well. I realize JSONPath might
> have more capabilities than RecordPath at the moment, but it seems a
> shame to force the user to convert to JSON to use a "RouteOnJSONPath"
> processor, the record-aware processors are meant to replace that kind
> of format-specific functionality.
> 
> Regards,
> Matt
> 
> On Tue, Mar 20, 2018 at 12:19 PM, Sivaprasanna
> <sivaprasanna246@gmail.com> wrote:
>> Like the idea that Otto suggested. RoutOnJSONPath makes more sense since
>> making the flattened JSON write to attributes is restricted to that
>> processor alone.
>> 
>> On Tue, Mar 20, 2018 at 8:37 PM, Otto Fowler <ottobackwards@gmail.com>
>> wrote:
>> 
>>> Why not create a new processor that does routeOnJSONPath and works on
> the
>>> flow file?
>>> 
>>> 
>>> On March 20, 2018 at 10:39:37, Jorge Machado (jomach@me.com) wrote:
>>> 
>>> So that is what we actually are doing EvaluateJsonPath the problem with
>>> that is, that is hard to build something generic if we need to specify
> each
>>> property by his name, that’s why this idea.
>>> 
>>> Should I make a PR for this or is this to business specific ?
>>> 
>>> 
>>> Jorge Machado
>>> 
>>>> On 20 Mar 2018, at 15:30, Bryan Bende <bbende@gmail.com> wrote:
>>>> 
>>>> Ok so I guess it depends whether you end up needing all 30 fields as
>>>> attributes to achieve the logic in your flow, or if you only need a
>>>> couple.
>>>> 
>>>> If you only need a couple you could probably use EvaluateJsonPath
>>>> after FlattenJson to extract just the couple of fields you need into
>>>> attributes.
>>>> 
>>>> If you need them all then I guess it makes sense to want the option to
>>>> flatten into attributes.
>>>> 
>>>> On Tue, Mar 20, 2018 at 10:14 AM, Jorge Machado <jomach@me.com> wrote:
>>>>> From there on we use a lot of routeOnAttritutes and use that values
> on
>>> sql queries to other tables like select * from someTable where
>>> id=${myExtractedAttribute}
>>>>> To be honest I tryed JoltTransformJSON but I could not get it working
> :)
>>>>> 
>>>>> Jorge Machado
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 20 Mar 2018, at 15:12, Matt Burgess <mattyb149@apache.org>
wrote:
>>>>>> 
>>>>>> I think Bryan is asking about what happens AFTER this part of the
>>>>>> flow. For example, if you are doing routing you can use QueryRecord
>>>>>> (and you won't need the SplitJson), if you are doing transformations
>>>>>> you can use JoltTransformJSON (often without SplitJson as well),
> etc.
>>>>>> 
>>>>>> Regards,
>>>>>> Matt
>>>>>> 
>>>>>> On Tue, Mar 20, 2018 at 10:08 AM, Jorge Machado <jomach@me.com>
> wrote:
>>>>>>> Hi Bryan,
>>>>>>> 
>>>>>>> thanks for the help.
>>>>>>> Our Flow: ExecuteSql -> convertToJSON -> SplitJson ->
ExecuteScript
>>> with attachedcode 1.
>>>>>>> 
>>>>>>> We are now writting a custom processor that does this which is
a
> copy
>>> of FlattenJson but instead of putting the result into a flowfile we put
> it
>>> into the attributes.
>>>>>>> That’s why I asked if it makes sense to contribute this back
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Attached code 1:
>>>>>>> 
>>>>>>> import org.apache.commons.io.IOUtils
>>>>>>> import java.nio.charset.*
>>>>>>> def flowFile = session.get();
>>>>>>> if (flowFile == null) {
>>>>>>> return;
>>>>>>> }
>>>>>>> def slurper = new groovy.json.JsonSlurper()
>>>>>>> def attrs = [:] as Map<String,String>
>>>>>>> session.read(flowFile,
>>>>>>> { inputStream ->
>>>>>>> def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>>>>>>> def obj = slurper.parseText(text)
>>>>>>> obj.each {k,v ->
>>>>>>> if(v!=null && v.toString()!=""){
>>>>>>> attrs[k] = v.toString()
>>>>>>> }
>>>>>>> }
>>>>>>> } as InputStreamCallback)
>>>>>>> flowFile = session.putAllAttributes(flowFile, attrs)
>>>>>>> session.transfer(flowFile, REL_SUCCESS)
>>>>>>> 
>>>>>>> some code removed
>>>>>>> 
>>>>>>> 
>>>>>>> Jorge Machado
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 20 Mar 2018, at 15:03, Bryan Bende <bbende@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>> Ok it is still not clear what the reason for needing it in
> attributes
>>>>>>>> is though... Is there another processor you are using after
this
> that
>>>>>>>> only works off attributes?
>>>>>>>> 
>>>>>>>> Just trying to understand if there is another way to accomplish
> what
>>>>>>>> you want to do.
>>>>>>>> 
>>>>>>>> On Tue, Mar 20, 2018 at 9:50 AM, Jorge Machado <jomach@me.com>
>>> wrote:
>>>>>>>>> We are using nifi for Workflow and we get from a database
like
>>> job_status and job_name and some nested json columns. (30 columns)
>>>>>>>>> We need to put it as attributes from the Flow file and
not the
>>> content. For the first part (columns without a json is done by groovy
>>> script) but then would be nice to use this standard processor and
> instead
>>> of writing this to a flow content write it to attributes.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jorge Machado
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 20 Mar 2018, at 14:47, Bryan Bende <bbende@gmail.com>
wrote:
>>>>>>>>>> 
>>>>>>>>>> What would be the main use case for wanting all the
flattened
>>> values
>>>>>>>>>> in attributes?
>>>>>>>>>> 
>>>>>>>>>> If the reason was to keep the original content, we
could
> probably
>>> just
>>>>>>>>>> added an original relationship.
>>>>>>>>>> 
>>>>>>>>>> Also, I think FlattenJson supports flattening a flow
file where
> the
>>>>>>>>>> root is an array of JSON documents (although I'm
not totally
> sure),
>>> so
>>>>>>>>>> you'd have to consider what to do in that case.
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 20, 2018 at 5:26 AM, Pierre Villard
>>>>>>>>>> <pierre.villard.fr@gmail.com> wrote:
>>>>>>>>>>> No I do see how this could be convenient in some
cases. My
> comment
>>> was
>>>>>>>>>>> more: you can certainly submit a PR for that
feature, but it'll
>>> need to be
>>>>>>>>>>> clearly documented using the appropriate annotations,
>>> documentation, and
>>>>>>>>>>> property descriptions.
>>>>>>>>>>> 
>>>>>>>>>>> 2018-03-20 10:20 GMT+01:00 Jorge Machado <jomach@me.com>:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Pierre, I’m aware of that. So This means
the change would
> not
>>> be
>>>>>>>>>>>> accepted correct ?
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards
>>>>>>>>>>>> 
>>>>>>>>>>>> Jorge Machado
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 20 Mar 2018, at 09:54, Pierre Villard
<
>>> pierre.villard.fr@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Jorge,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think this should be carefully documented
to remind users
> that
>>> the
>>>>>>>>>>>>> attributes are in memory. Doing what
you propose would mean
>>> having in
>>>>>>>>>>>>> memory the full content of the flow file
as long as the flow
>>> file is
>>>>>>>>>>>>> processed in the workflow (unless you
remove attributes using
>>>>>>>>>>>>> UpdateAttributes).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Pierre
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2018-03-20 7:55 GMT+01:00 Jorge Machado
<jomach@me.com>:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey guys,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I would like to change the FlattenJson
Procerssor to be
>>> possible to
>>>>>>>>>>>>>> Flatten to the attributes instead
of Only to content. Is
> this a
>>> good
>>>>>>>>>>>> Idea ?
>>>>>>>>>>>>>> would the PR be accepted ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jorge Machado
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message