nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: ReplaceText duplicates
Date Thu, 10 Sep 2015 16:44:12 GMT
Chris,

I was stumped on this for a few minutes, but then realized I was only
trying your template against the latest 0.3.0 code that has not been
released.
Sure enough, switching to the 0.2.1 release, I now see your issue where the
content of the FlowFile is getting the matched value twice.

The good news is this was identified and fixed for the upcoming release:
https://issues.apache.org/jira/browse/NIFI-911

It looks like in the meantime you could change the ReplaceText regular
expression to (?s:^.*$) for the ReplaceText coming after ExtractText.

Another ticket in 0.3.0 that may be relevant for you, is this one:
https://issues.apache.org/jira/browse/NIFI-808

It allows you to turn off capturing group 0 since in a lot of cases this
isn't used and could be large, so you would only end up with secaudit.json
and secaudit.json.1

-Bryan


On Thu, Sep 10, 2015 at 12:16 PM, Christopher Wilson <wilsoncj1@gmail.com>
wrote:

> The behavior I see is for the ExtractText -> ReplaceText path where the
> attributes, secaudit.json, secaudit.json.0, and secaudit.json.1 are
> concatenated into the payload (below).
>
> What I expected was that the attribute, secaudit.json, would have replaced
> the payload.  I've tried .0 and .1 as the replacement attribute and I still
> see the same behavior.
>
> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp":
> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60",
> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
> "initiator": {"typeURI": "service/security/account/user", "host": {"agent":
> "python-keystoneclient", "address": "10.0.0.60"}, "id":
> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
> "service/security/account/user", "id":
> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
> "service/security", "id":
> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
> "outcome": "success", "id":
> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}{"priority": "INFO", "event_type":
> "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460",
> "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "
> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI":
> "service/security/account/user", "host": {"agent": "python-keystoneclient",
> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"},
> "target": {"typeURI": "service/security/account/user", "id":
> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
> "service/security", "id":
> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
> "outcome": "success", "id":
> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}
>
> -Chris
>
> On Thu, Sep 10, 2015 at 11:55 AM, Bryan Bende <bbende@gmail.com> wrote:
>
>> Chris,
>>
>> I've been playing around with your template, and as far as I can tell
>> both routes (ExtractText+ReplaceText vs. just ReplaceText) are producing a
>> FlowFile with the same content, the difference is in the attributes...
>>
>> For ExtractText + ReplaceText I see this:
>>
>> Key: 'secaudit.json'
>> Value: '{"priority": "INFO", "event_type": "identity.authenticate",
>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id":
>> "identity.ip-10-0-0-60", "payload": {"typeURI": "
>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI":
>> "service/security/account/user", "host": {"agent": "python-keystoneclient",
>> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"},
>> "target": {"typeURI": "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
>> Key: 'secaudit.json.0'
>> Value: '{"priority": "INFO", "event_type": "identity.authenticate",
>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id":
>> "identity.ip-10-0-0-60", "payload": {"typeURI": "
>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI":
>> "service/security/account/user", "host": {"agent": "python-keystoneclient",
>> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"},
>> "target": {"typeURI": "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
>> Key: 'secaudit.json.1'
>> Value: '{"priority": "INFO", "event_type": "identity.authenticate",
>> "timestamp": "2015-08-18 23:29:17.358460", "publisher_id":
>> "identity.ip-10-0-0-60", "payload": {"typeURI": "
>> http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI":
>> "service/security/account/user", "host": {"agent": "python-keystoneclient",
>> "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"},
>> "target": {"typeURI": "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
>> --------------------------------------------------
>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp":
>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60",
>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent":
>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>> "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}
>>
>>
>> The content/payload is the part below the --------------------, and the
>> three attributes secaudit.json, secaudit.json.0, and secaudit.json.1 are
>> the resulting attributes from ExtractText.
>> The reason for those three attributes is that it puts the first match
>> into an attribute with the name of the property you specified
>> (secaudit.json), then it puts the entire match into index 0 (in case you
>> had multiple capture groups this would have them all) then it puts each
>> capture group after that starting with 1.
>>
>> For the ReplaceText by itself I see:
>> ....
>> --------------------------------------------------
>> {"priority": "INFO", "event_type": "identity.authenticate", "timestamp":
>> "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60",
>> "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event",
>> "initiator": {"typeURI": "service/security/account/user", "host": {"agent":
>> "python-keystoneclient", "address": "10.0.0.60"}, "id":
>> "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI":
>> "service/security/account/user", "id":
>> "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI":
>> "service/security", "id":
>> "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity",
>> "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate",
>> "outcome": "success", "id":
>> "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id":
>> "8c5c8576-9850-4920-a1d5-1053e2c704d7"}
>>
>>
>> Is this the same behavior you are seeing?
>>
>>
>> -Bryan
>>
>>
>> On Thu, Sep 10, 2015 at 11:22 AM, Matt Gilman <matt.c.gilman@gmail.com>
>> wrote:
>>
>>> Chris,
>>>
>>> Since your dealing with JSON data, you may want to consider using
>>> EvaluateJsonPath. It supports specifying XPath like expressions to extract
>>> values and store into FlowFile attributes or content. If your extracting
>>> into attributes, you can evaluate multiple paths. However, if your
>>> extracting into FlowFile content you can only specify a single path.
>>>
>>> I'll take a look at your template to see what's going on.
>>>
>>> Matt
>>>
>>> On Thu, Sep 10, 2015 at 11:00 AM, Christopher Wilson <
>>> wilsoncj1@gmail.com> wrote:
>>>
>>>> I've ran into an issue with ReplaceText on another thread but thought
>>>> I'd move this over to it's own.
>>>>
>>>> What I have is a syslog entry from OpenStack that contains CADF (Cloud
>>>> Audit Data Federation) JSON as the payload.  In the context of OpenStack
>>>> these are login/security events that we'd like to see outside of a normal
>>>> syslog stream and passed directly over to the security team.  I'd started
>>>> down the path of ExtractText and pulling out the associated JSON into an
>>>> attribute but found when I wired in a ReplaceText and tried to replace the
>>>> content with the attribute 3 copies of the JSON data were written to the
>>>> file content.
>>>>
>>>> What I've since learned is I can just replace the text in place without
>>>> yanking into an attribute.  However, I can see cases where I might want to
>>>> replace/append text using one or more attributes.  Wanted to see if other
>>>> have handled this differently and if there is an enhancement request in the
>>>> offing?
>>>>
>>>> I put the template I was working from, with a line of the syslog data,
>>>> up on GitHub in case anyone wants to see this behavior in action.  You just
>>>> have to play with turning processors on/off when viewing the full bulletin
>>>> board.
>>>>
>>>> https://github.com/cj-wilson/NiFi-Templates
>>>>
>>>> Thanks in advance.
>>>>
>>>> -Chris
>>>>
>>>
>>>
>>
>

Mime
View raw message