That's awesome, thank you very much!

-Chris

On Thu, Sep 10, 2015 at 12:44 PM, Bryan Bende <bbende@gmail.com> wrote:
Chris,

I was stumped on this for a few minutes, but then realized I was only trying your template against the latest 0.3.0 code that has not been released.
Sure enough, switching to the 0.2.1 release, I now see your issue where the content of the FlowFile is getting the matched value twice.

The good news is this was identified and fixed for the upcoming release:
https://issues.apache.org/jira/browse/NIFI-911

It looks like in the meantime you could change the ReplaceText regular expression to (?s:^.*$) for the ReplaceText coming after ExtractText.

Another ticket in 0.3.0 that may be relevant for you, is this one:
https://issues.apache.org/jira/browse/NIFI-808

It allows you to turn off capturing group 0 since in a lot of cases this isn't used and could be large, so you would only end up with secaudit.json and secaudit.json.1

-Bryan


On Thu, Sep 10, 2015 at 12:16 PM, Christopher Wilson <wilsoncj1@gmail.com> wrote:
The behavior I see is for the ExtractText -> ReplaceText path where the attributes, secaudit.json, secaudit.json.0, and secaudit.json.1 are concatenated into the payload (below).

What I expected was that the attribute, secaudit.json, would have replaced the payload.  I've tried .0 and .1 as the replacement attribute and I still see the same behavior.

{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"}{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"}

-Chris

On Thu, Sep 10, 2015 at 11:55 AM, Bryan Bende <bbende@gmail.com> wrote:
Chris,

I've been playing around with your template, and as far as I can tell both routes (ExtractText+ReplaceText vs. just ReplaceText) are producing a FlowFile with the same content, the difference is in the attributes...

For ExtractText + ReplaceText I see this:

Key: 'secaudit.json'
Value: '{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
Key: 'secaudit.json.0'
Value: '{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
Key: 'secaudit.json.1'
Value: '{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"}'
--------------------------------------------------
{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"}


The content/payload is the part below the --------------------, and the three attributes secaudit.json, secaudit.json.0, and secaudit.json.1 are the resulting attributes from ExtractText. 
The reason for those three attributes is that it puts the first match into an attribute with the name of the property you specified (secaudit.json), then it puts the entire match into index 0 (in case you had multiple capture groups this would have them all) then it puts each capture group after that starting with 1.

For the ReplaceText by itself I see:
....
--------------------------------------------------
{"priority": "INFO", "event_type": "identity.authenticate", "timestamp": "2015-08-18 23:29:17.358460", "publisher_id": "identity.ip-10-0-0-60", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "initiator": {"typeURI": "service/security/account/user", "host": {"agent": "python-keystoneclient", "address": "10.0.0.60"}, "id": "cbd0f5c99e774b31bc4d9988ddfb698c"}, "target": {"typeURI": "service/security/account/user", "id": "openstack:036bdbcd-39ce-4545-956d-2a1a2c88dd6b"}, "observer": {"typeURI": "service/security", "id": "openstack:7c1bef2a-c90d-4f15-aa12-ec14bb990c7b"}, "eventType": "activity", "eventTime": "2015-08-18T23:29:17.358172+0000", "action": "authenticate", "outcome": "success", "id": "openstack:305e6c25-93ee-4897-ab87-20092d14db95"}, "message_id": "8c5c8576-9850-4920-a1d5-1053e2c704d7"} 


Is this the same behavior you are seeing?


-Bryan


On Thu, Sep 10, 2015 at 11:22 AM, Matt Gilman <matt.c.gilman@gmail.com> wrote:
Chris,

Since your dealing with JSON data, you may want to consider using EvaluateJsonPath. It supports specifying XPath like expressions to extract values and store into FlowFile attributes or content. If your extracting into attributes, you can evaluate multiple paths. However, if your extracting into FlowFile content you can only specify a single path.

I'll take a look at your template to see what's going on.

Matt

On Thu, Sep 10, 2015 at 11:00 AM, Christopher Wilson <wilsoncj1@gmail.com> wrote:
I've ran into an issue with ReplaceText on another thread but thought I'd move this over to it's own.

What I have is a syslog entry from OpenStack that contains CADF (Cloud Audit Data Federation) JSON as the payload.  In the context of OpenStack these are login/security events that we'd like to see outside of a normal syslog stream and passed directly over to the security team.  I'd started down the path of ExtractText and pulling out the associated JSON into an attribute but found when I wired in a ReplaceText and tried to replace the content with the attribute 3 copies of the JSON data were written to the file content.

What I've since learned is I can just replace the text in place without yanking into an attribute.  However, I can see cases where I might want to replace/append text using one or more attributes.  Wanted to see if other have handled this differently and if there is an enhancement request in the offing?

I put the template I was working from, with a line of the syslog data, up on GitHub in case anyone wants to see this behavior in action.  You just have to play with turning processors on/off when viewing the full bulletin board.