nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Wilson <wilson...@gmail.com>
Subject Re: ReplaceText duplicates
Date Thu, 10 Sep 2015 17:53:23 GMT
That's awesome, thank you very much!

-Chris

On Thu, Sep 10, 2015 at 12:44 PM, Bryan Bende <bbende@gmail.com> wrote:

> Chris,
>
> I was stumped on this for a few minutes, but then realized I was only
> trying your template against the latest 0.3.0 code that has not been
> released.
> Sure enough, switching to the 0.2.1 release, I now see your issue where
> the content of the FlowFile is getting the matched value twice.
>
> The good news is this was identified and fixed for the upcoming release:
> https://issues.apache.org/jira/browse/NIFI-911
>
> It looks like in the meantime you could change the ReplaceText regular
> expression to (?s:^.*$) for the ReplaceText coming after ExtractText.
>
> Another ticket in 0.3.0 that may be relevant for you, is this one:
> https://issues.apache.org/jira/browse/NIFI-808
>
> It allows you to turn off capturing group 0 since in a lot of cases this
> isn't used and could be large, so you would only end up with secaudit.json
> and secaudit.json.1
>
> -Bryan
>
>
> On Thu, Sep 10, 2015 at 12:16 PM, Christopher Wilson <wilsoncj1@gmail.com>
> wrote:
>
>> The behavior I see is for the ExtractText -> ReplaceText path where the
>> attributes, secaudit.json, secaudit.json.0, and secaudit.json.1 are
>> concatenated into the payload (below).
>>
>> What I expected was that the attribute, secaudit.json, would have
>> replaced the payload.  I've tried .0 and .1 as the replacement attribute
>> and I still see the same behavior.
>>
>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp":
>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60",
>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent":
>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>> "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}{"priority": "INFO", "event_type":
>> "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460",
>> "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "
>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI":
>> "service/security/account/user", "host": {"agent": "python-keystoneclient",
>> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"},
>> "target": {"typeURI": "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}
>>
>> -Chris
>>
>> On Thu, Sep 10, 2015 at 11:55 AM, Bryan Bende <bbende@gmail.com> wrote:
>>
>>> Chris,
>>>
>>> I've been playing around with your template, and as far as I can tell
>>> both routes (ExtractText+ReplaceText vs. just ReplaceText) are producing a
>>> FlowFile with the same content, the difference is in the attributes...
>>>
>>> For ExtractText + ReplaceText I see this:
>>>
>>> Key: 'secaudit.json'
>>> Value: '{"priority": "INFO", "event_type": "identity.authenticate",
>>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id":
>>> "identity.ip-10-0-0-60", "payload": {"typeURI": "
>>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator":
>>> {"typeURI": "service/security/account/user", "host": {"agent":
>>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>>> "service/security/account/user", "id":
>>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>>> "service/security", "id":
>>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>>> "outcome": "success", "id":
>>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
>>> Key: 'secaudit.json.0'
>>> Value: '{"priority": "INFO", "event_type": "identity.authenticate",
>>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id":
>>> "identity.ip-10-0-0-60", "payload": {"typeURI": "
>>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator":
>>> {"typeURI": "service/security/account/user", "host": {"agent":
>>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>>> "service/security/account/user", "id":
>>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>>> "service/security", "id":
>>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>>> "outcome": "success", "id":
>>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
>>> Key: 'secaudit.json.1'
>>> Value: '{"priority": "INFO", "event_type": "identity.authenticate",
>>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id":
>>> "identity.ip-10-0-0-60", "payload": {"typeURI": "
>>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator":
>>> {"typeURI": "service/security/account/user", "host": {"agent":
>>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>>> "service/security/account/user", "id":
>>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>>> "service/security", "id":
>>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>>> "outcome": "success", "id":
>>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
>>> --------------------------------------------------
>>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp":
>>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60",
>>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
>>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent":
>>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>>> "service/security/account/user", "id":
>>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>>> "service/security", "id":
>>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>>> "outcome": "success", "id":
>>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}
>>>
>>>
>>> The content/payload is the part below the --------------------, and the
>>> three attributes secaudit.json, secaudit.json.0, and secaudit.json.1 are
>>> the resulting attributes from ExtractText.
>>> The reason for those three attributes is that it puts the first match
>>> into an attribute with the name of the property you specified
>>> (secaudit.json), then it puts the entire match into index 0 (in case you
>>> had multiple capture groups this would have them all) then it puts each
>>> capture group after that starting with 1.
>>>
>>> For the ReplaceText by itself I see:
>>> ....
>>> --------------------------------------------------
>>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp":
>>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60",
>>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
>>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent":
>>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>>> "service/security/account/user", "id":
>>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>>> "service/security", "id":
>>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>>> "outcome": "success", "id":
>>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}
>>>
>>>
>>> Is this the same behavior you are seeing?
>>>
>>>
>>> -Bryan
>>>
>>>
>>> On Thu, Sep 10, 2015 at 11:22 AM, Matt Gilman <matt.c.gilman@gmail.com>
>>> wrote:
>>>
>>>> Chris,
>>>>
>>>> Since your dealing with JSON data, you may want to consider using
>>>> EvaluateJsonPath. It supports specifying XPath like expressions to extract
>>>> values and store into FlowFile attributes or content. If your extracting
>>>> into attributes, you can evaluate multiple paths. However, if your
>>>> extracting into FlowFile content you can only specify a single path.
>>>>
>>>> I'll take a look at your template to see what's going on.
>>>>
>>>> Matt
>>>>
>>>> On Thu, Sep 10, 2015 at 11:00 AM, Christopher Wilson <
>>>> wilsoncj1@gmail.com> wrote:
>>>>
>>>>> I've ran into an issue with ReplaceText on another thread but thought
>>>>> I'd move this over to it's own.
>>>>>
>>>>> What I have is a syslog entry from OpenStack that contains CADF (Cloud
>>>>> Audit Data Federation) JSON as the payload.  In the context of OpenStack
>>>>> these are login/security events that we'd like to see outside of a normal
>>>>> syslog stream and passed directly over to the security team.  I'd started
>>>>> down the path of ExtractText and pulling out the associated JSON into
an
>>>>> attribute but found when I wired in a ReplaceText and tried to replace
the
>>>>> content with the attribute 3 copies of the JSON data were written to
the
>>>>> file content.
>>>>>
>>>>> What I've since learned is I can just replace the text in place
>>>>> without yanking into an attribute.  However, I can see cases where I
might
>>>>> want to replace/append text using one or more attributes.  Wanted to
see if
>>>>> other have handled this differently and if there is an enhancement request
>>>>> in the offing?
>>>>>
>>>>> I put the template I was working from, with a line of the syslog data,
>>>>> up on GitHub in case anyone wants to see this behavior in action.  You
just
>>>>> have to play with turning processors on/off when viewing the full bulletin
>>>>> board.
>>>>>
>>>>> https://github.com/cj-wilson/NiFi-Templates
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> -Chris
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message