nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Skora <jsk...@gmail.com>
Subject Re: [jira] [Created] (NIFI-921) Create a processor to promote character delimited data to attributes
Date Thu, 03 Sep 2015 12:57:55 GMT
Is the (very) general idea sort of a combination of the functionality
ExtractText and UpdateAttributes?

On Thu, Sep 3, 2015 at 12:37 AM, Aldrin Piri (JIRA) <jira@apache.org> wrote:

> Aldrin Piri created NIFI-921:
> --------------------------------
>
>              Summary: Create a processor to promote character delimited
> data to attributes
>                  Key: NIFI-921
>                  URL: https://issues.apache.org/jira/browse/NIFI-921
>              Project: Apache NiFi
>           Issue Type: Improvement
>           Components: Extensions
>             Reporter: Aldrin Piri
>             Priority: Minor
>
>
> A processor that can analyze content and promote character delimited data
> to attributes could prove quite helpful.
>
> There are a large number of "schemas"/formats that are simply character
> delimited formats.  Typically these records are quite small in format but
> "rich" in terms of the values that they possess.  This processor would
> provide an easy means to handle these simpler formats and make for an easy
> way to reason about data in this class of formats.
>
> We can approximate this by performing a regular expression within
> ExtractText and capturing groups, but this is not a good fit for regexes.
>
> The processor would handle likely be fed by a split text processor but,
> with some reasonable consideration, could handle this splitting of text
> along rows generating a unique flowfile for each.  Exact contract would
> need some consideration in terms of the content that passes through
> (entirety of original file, row by itself, row with header if it exists)
>
> Additionally, the processor could also consider if there is a header,
> delimited in the same fashion as each of its constituent records.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message