nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Validating an array of objects using ConvertJSONToAvro
Date Sat, 04 Feb 2017 03:10:05 GMT
Bas,

Sorry for the late reply, I should've mentioned sooner that I am
looking into this issue. From your description it seems like
ConvertJSONtoAvro should be able to handle this kind of thing; if I
can't find a schema that fits and instead confirm it is a
bug/improvement, I will write up a Jira and inform this list either
way.  Thank you for your question, IMO this is indeed a valid use case
that should be supported.

Regards,
Matt

On Tue, Jan 31, 2017 at 9:10 AM, Bas van Kortenhof
<bas.vankortenhof@sanoma.com> wrote:
> Hi all,
>
> Not completely sure if this is a developer or user question, but I'm posting
> it here for now as at this moment it is related to flow design.
>
> So what I'm trying to achieve is to get a JSON response from an API, extract
> the relevant values, validate this data and convert it to avro. I am able to
> complete the first two steps with InvokeHTTP and JoltTransformJSON, after
> which my data is an array of objects in JSON, so my flowfile looks like
> this:
>
> [
>   {"key1": "val1", "key2": "val2"},
>   {"key1": "val3", "key2": "val4"}
> ]
>
> My idea was now to put this JSON in a ConvertJSONToAvro together with the
> appropriate avro schema. However, ConvertJSONToAvro cannot apply schema
> validation on the individual elements of an array. It can, however, apply
> schema validation to records that are not contained in an array but are
> separated by newlines, so it can handle the following flowfile (note that
> this, on a file level, is basically invalid JSON):
>
> {"key1": "val1", "key2": "val2"}
> {"key1": "val3", "key2": "val4"}
>
> I can achieve this in NiFi by splitting the JSON flowfile with SplitJSON and
> merging it back together immediately with a MergeContent processor with '\n'
> as demarcator. These both have to be applied before the ConvertJSONToAvro,
> because otherwise invalid records would cause the merge step to fail. So
> this splitting can't even be used to redistribute files in a cluster
> setting, so I don't really like this workaround.
>
> I was wondering if anyone knows a way to produce the second example format
> of JSON using a JOLT transformation, which would be an elegant fix. If not,
> I'd like to ask if there is a reason that ConvertJSONToAvro can only handle
> newline separated objects and not objects in an array (which is the closest
> representation in JSON of the concept of records in Avro in my opinion). If
> no such reason I think it can be considered a bug and then I would like to
> propose to provide an option in the ConvertJSONToAvro processor to apply the
> schema validation on the whole file, on objects separated by newlines or on
> objects in an array.
>
> Please let me know what you think!
>
> Regards,
> Bas
>
>
>
> --
> View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/Validating-an-array-of-objects-using-ConvertJSONToAvro-tp832.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Mime
View raw message