nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Help with loading a file into a cache
Date Fri, 30 Nov 2018 20:18:44 GMT
Dave,

Depending on the processor you're going to use to store these records
into cache (see Mike's reply), if you want to convert each of the
lines to JSON objects, you can use ReplaceText:

Search Value: ^([^:]+):(.*)
Replacement Value: {"$1":$2}
Replacement Strategy: Line By Line

This creates a valid JSON object on each line, having one key whose
value is the embedded JSON object. Then, as of NiFi 1.7.0 [1] you can
use record-based processors with a JsonTreeReader and it will process
one JSON per line.

If you'd like to have the key as an attribute and only the JSON object
as the payload, you can use SplitText with Line Count = 1 to split the
file into individual flow files (1 line per file), then ExtractText to
get the key. Add a user-defined property (let's call it cache.key) to
ExtractText:

cache.key = ^([^:]+):.*

This extracts the key into an attribute called cache.key, but the
value still remains in the flow file, so you'll need a ReplaceText to
remove it:

Search Value: ^([^:]+):(.*)
Replacement Value: $2
Replacement Strategy: Line By Line

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-4456
On Fri, Nov 30, 2018 at 2:47 PM DAVID SMITH
<davidrsmith@btinternet.com.invalid> wrote:
>
> Hi
>
> As requested here is an example file with some redacted data:
>
> ZA105:{"Aircraft Type":"Sea King", "Lifed Items":{ "port engine ser#":"RR-P1234", "starboard
engine ser#":"RR-S1234","gearboxes ser#":[ "WHM1234", "WHI1234", "WHT1234" ] }}
> ZA106:{"Aircraft Type":"Sea King", "Lifed Items":{ "port engine ser#":"RR-P2345", "starboard
engine ser#":"RR-S2345","gearboxes ser#":[ "WHM2345", "WHI2345", "WHT2345" ] }}
> ZA107:{"Aircraft Type":"Merlin", "Lifed Items":{ "port engine ser#":"RR-P3456", "starboard
engine ser#":"RR-S3456","centre engine ser#":"RR-C3456","gearboxes ser#":[ "WHM3456", "WHI3456",
"WHT3456" ] }}
> ZA108:{"Aircraft Type":"Merlin", "Lifed Items":{ "port engine ser#":"RR-P4567", "starboard
engine ser#":"RR-S4567","centre engine ser#":"RR-C4567","gearboxes ser#":[ "WHM4567", "WHI4567",
"WHT4567" ] }}
> ZA109:{"Aircraft Type":"Wessex", "Lifed Items":{ "port engine":"RR-P9876", "starboard
engine":"RR-S9876","gearboxes":[ "WHM9876", "WHI9876", "WHT9876" ] }}
> ZA104:{"Aircraft Type":"Wessex", "Lifed Items":{ "port engine":"RR-P8765", "starboard
engine":"RR-S8765","gearboxes":[ "WHM8765", "WHI8765", "WHT8765" ] }}
> ZA103:{"Aircraft Type":"Wessex", "Lifed Items":{ "port engine":"RR-P7654", "starboard
engine":"RR-S7654","gearboxes":[ "WHM7654", "WHI7654", "WHT7654" ] }}
>
>
>
> What I would like is the aircraft tail no eg ZA104 to be the key of the cache item and
everything after the colon (the aircraft type and replaceables serial numbers to be the cached
item value. The cached item value can stay as a JSON string.
>
>
> Many thanks
>
> Dave
> --------------------------------------------
> On Fri, 30/11/18, Mike Thomsen <mikerthomsen@gmail.com> wrote:
>
>  Subject: Re: Help with loading a file into a cache
>  To: dev@nifi.apache.org
>  Date: Friday, 30 November, 2018, 15:26
>
>  Dave,
>
>  Can you post a redacted example with dummy
>  data?
>
>  Thanks,
>
>  Mike
>
>  On
>  Fri, Nov 30, 2018 at 7:08 AM DAVID SMITH
>  <davidrsmith@btinternet.com.invalid>
>  wrote:
>
>  > Hi Devs
>  > I am running a NiFi 1.8 cluster, each node
>  has 128Gb of Ram. I need to
>  > load the
>  contents of a file of which is around 5Gb in size  into
>  a
>  > Key/Value cache.
>  >
>  The file I want to load is produced by another company so
>  the format it
>  > comes in is not
>  negotiable. The file contains thousands of lines in the
>  > following format:-
>  >
>  <index value1>:{<property1 name>: <property1
>  value>, <property2
>  >
>  name>:<property2 value>}<index
>  value2>:{<property1 name>: <property1
>  > value>, <property2
>  name>:<property2 value>}
>  >
>  <index value3>:{<property1 name>: <property1
>  value>, <property2
>  >
>  name>:<property2 value>}
>  >
>  > I want the index value to become the Key
>  and everything  beyond the colon
>  > to
>  become the value.
>  > What would be the
>  most efficient way of reading the file, and parsing it
>  > to load into a cache, I thought of reading
>  in the file, using a split
>  > content on
>  CR/LF and then splitting on the first colon.I have noticed
>  in
>  > 1.8 there are some CSV and JSON
>  Readers (controller services), would these
>  > be a better way of doing this, but the
>  problem I can see is that the file
>  >
>  isn't quite a CSV and it isn't quite a JSON Array
>  file.
>  > Many thanksDave
>

Mime
View raw message