nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Need a sample JSON input file for InferAvroSchema -> PutDatabaseRecord
Date Tue, 14 Aug 2018 17:26:18 GMT
Bob,

Your input JSON is a single JSON object "producer" containing an array of
objects. In this case PutDatabaseRecord handles "each" record, but that's
the single "producer" record, so it treats it as if it were one row, and
the field "owner_producer" isn't in that top-level object/record.

I assume you want to put each object of the array as a row into the
database. In that case you will want to "hoist" the array as the top-level
JSON, so PutDatabaseRecord knows that each object within is a single
row/record. Since this is JSON you can use JoltTransformJSON with the
following Chain spec:

[
  {
    "operation": "shift",
    "spec": {
      "producer": {
        "*": "[#1]"
      }
    }
  }
]

Then PutDatabaseRecord should iterate over the array, and the field names
of each row/record will match the column names in the target DB. I think
there is a different processor that lets you specify an optional
"top-level" field name (like "producer" in your case), expecting it to
point at an array, and then will iterate over that instead of expecting an
top-level array. We could add that, or even better a RecordPath expression
perhaps, to PutDatabaseRecord such that you wouldn't have to transform the
whole flow file if you want to put a nested array into a database. That
could help with flow files that contain data destined for multiple
tables/DBs, where the original content can feed multiple PutDatabaseRecords
without having to transform/alter it (which is more efficient for NiFi).
I'll think more about that and probably write up an Improvement Jira for it
(or please feel free to do so yourself).

Regards,
Matt


On Tue, Aug 14, 2018 at 12:21 PM Kuhfahl, Bob <rkuhfahl@mitre.org> wrote:

> Sorry for the newbie problems.
>
> For me, I have to format my input file to be more like:
>
>
>
> {"producer": [{
>
>     "fpa": "MATP",
>
>     "owner_producer": "US",
>
>     "prod_lvl_cap": "M",
>
>     "producer_datetime_last_chg": "20190101",
>
>     "producer_userid": "mytest",
>
>     "res_prod": "DJ",
>
>     "review_date": "20071015"
>
> },
>
> {
>
>     "fpa": "ELEC",
>
>     "owner_producer": "US",
>
>     "prod_lvl_cap": "M",
>
>     "producer_datetime_last_chg": "20190101",
>
>     "producer_userid": "fdolomite",
>
>     "res_prod": "DJ",
>
>     "review_date": "20111118"
>
> },
>
> {
>
>     "fpa": "AFLD",
>
>     "owner_producer": "US",
>
>     "prod_lvl_cap": "M",
>
>     "producer_datetime_last_chg": "20190101",
>
>     "producer_userid": "brenda",
>
>     "res_prod": "YF",
>
>     "review_date": "20140918"
>
> }]}
>
>
>
> Such that it will parse.  Anything shaped like what was in previous email
> will not make it past InferAvroSchema.
>
> Once I do this, I can define the JsonPathReader in PutDatabaseRecord to
> pick up this schema from ${inferred.avro.schema}
>
> All this works, and I’m confident PutDatabaseRecord is talking to the
> database as I am getting the error:
>
> Record does not have a value for the Required column 'owner_producer'
>
>
>
> The database is the only one that knows that’s a required field.
>
> The data is in the flow, but…. Not being found.
>
> Something is not lined up right…
>
>
>
> The schema coming out of InferAvroSchema is:
>
>
>
> {
>
>    "type": "record",
>
>    "name": "anything",
>
>    "fields": [{
>
>       "name": "producer",
>
>       "type": {
>
>          "type": "array",
>
>          "items": {
>
>             "type": "record",
>
>             "name": "producer",
>
>             "fields": [{
>
>                "name": "fpa",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"MATP\"'"
>
>             }, {
>
>                "name": "owner_producer",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"US\"'"
>
>             }, {
>
>                "name": "prod_lvl_cap",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"M\"'"
>
>             }, {
>
>                "name": "producer_datetime_last_chg",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"20190101\"'"
>
>             }, {
>
>                "name": "producer_userid",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"mytest\"'"
>
>             }, {
>
>                "name": "res_prod",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"DJ\"'"
>
>             }, {
>
>                "name": "review_date",
>
>                "type": "string",
>
>                "doc": "Type inferred from '\"20071015\"'"
>
>             }]
>
>          }
>
>       },
>
>       "doc": "Type inferred from
> '[{\"fpa\":\"MATP\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"mytest\",\"res_prod\":\"DJ\",\"review_date\":\"20071015\"},{\"midb_sk\":\"10035001359911\",\"midb_source_entity\":\"FacAka\",\"fpa\":\"ELEC\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"fdolomite\",\"res_prod\":\"DJ\",\"review_date\":\"20111118\"},{\"fpa\":\"AFLD\",\"owner_producer\":\"US\",\"prod_lvl_cap\":\"M\",\"producer_datetime_last_chg\":\"20190101\",\"producer_userid\":\"brenda\",\"res_prod\":\"YF\",\"review_date\":\"20140918\"}]'"
>
>    }]
>
> }
>
>
>
>
>
> *From: *Matt Burgess <mattyb149@apache.org>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Monday, August 13, 2018 at 11:19 AM
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Re: Need a sample JSON input file for InferAvroSchema
>
>
>
> Bob,
>
>
>
> InferAvroSchema can infer types like boolean, integer, long, float,
> double, and I believe for JSON can correctly descend into arrays and nested
> maps/structs/objects. Here is an example record from NiFi provenance data
> that has most of those covered (except bool and float/double, but you can
> add those):
>
>
>
> {
>
>   "eventId" : "7422645d-056e-423b-b280-6305f9daccaa",
>
>   "eventOrdinal" : 0,
>
>   "eventType" : "CREATE",
>
>   "timestampMillis" : 1496934288944,
>
>   "timestamp" : "2017-06-08T15:04:48.944Z",
>
>   "durationMillis" : -1,
>
>   "lineageStart" : 1496934288930,
>
>   "componentId" : "8821e5d8-015c-1000-30b0-f7211bbf43e5",
>
>   "componentType" : "GenerateFlowFile",
>
>   "componentName" : "_GenerateFlowFile",
>
>   "entityId" : "b99a56c6-e032-4396-915e-24186974b84a",
>
>   "entityType" : "org.apache.nifi.flowfile.FlowFile",
>
>   "entitySize" : 52,
>
>   "updatedAttributes" : {
>
>     "path" : "./",
>
>     "uuid" : "b99a56c6-e032-4396-915e-24186974b84a",
>
>     "filename" : "924304881186293"
>
>   },
>
>   "previousAttributes" : { },
>
>   "actorHostname" : "localhost",
>
>   "contentURI" : "
> http://localhost:8989/nifi-api/provenance-events/0/content/output",
>
>   "previousContentURI" : "
> http://localhost:8989/nifi-api/provenance-events/0/content/input",
>
>   "parentIds" : [ ],
>
>   "childIds" : [ ],
>
>   "platform" : "nifi",
>
>   "application" : "NiFi Flow"
>
> }
>
>
>
>  Note that the timestamps are longs as InferAvroSchema does not support
> Avro logical types (such as timestamp, date, decimal). I'd like to see an
> InferRecordSchema that is record-aware, supports time/date types, etc. I
> wrote up a Jira a while back to cover it [1] but haven't gotten around to
> implementing it yet.
>
>
>
> Regards,
>
> Matt
>
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-4109
>
>
>
>
>
> On Mon, Aug 13, 2018 at 11:02 AM Kuhfahl, Bob <rkuhfahl@mitre.org> wrote:
>
> Trying to develop a sample input file of json data to feed into
> InferAvroSchema so I can feed that into PutDatabaseRecord.
>
> Need a hello world example ☺
>
>
>
> But, to get started, I’d be happy to get InferAvroSchema working.  I’m
> “trial and error”-ing the input file hoping to get lucky, but..
>
>
>
> No log messages, flow of json data is going to failure,  I’m reading the
> code for InferAvroSchema()
>
> But it just calls  JsonUtil.inferSchema(), so I’ll keep digging down the
> path but… if someone has a sample input that demonstrates how it’s supposed
> to work, I’d be grateful!
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message