nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Iannone <bread...@gmail.com>
Subject Re: MergeContent resulting in corrupted JSON
Date Wed, 10 Jun 2020 18:07:32 GMT
Hey Mark,

I was thinking over this more and despite no complaints from Jackson
Objectmapper is it possible that hidden and/or control characters are
present in the JSON values which would then cause MergeContent to behave
this way? I looked over the code and nothing jumped out, but there is
something we had to do because of how the publisher is setting kafka header
attributes. Some attributes are bytes and not strings converted to bytes,
and ConsumeKafka seems to assume that these can always be converted to a
String. We had to change the encoding to be ISO8859 due to running into
issues with the bytes getting corrupted.

I'm also trying to better understand how the content is being stored in the
content repository, and whether something is going wrong when writing it
out.

Thanks,
Jason

On Tue, Jun 9, 2020 at 8:02 PM Mark Payne <markap14@hotmail.com> wrote:

> Hey Jason,
>
> Thanks for reaching out. That is definitely odd and not something that
> I’ve seen or heard about before.
>
> Are you certain that the data is not being corrupted upstream of the
> processor? I ask because the code for the processor that handles writing
> out the content is pretty straight forward and hasn’t been modified in over
> 3 years, so I would expect to see it happen often if it were a bug in the
> MergeContent processor itself. Any chance that you can create a flow
> template/sample data that recreates the issue? Anything particularly unique
> about your flow?
>
> Thanks
> -Mark
>
>
> > On Jun 9, 2020, at 6:47 PM, Jason Iannone <breadfan@gmail.com> wrote:
> >
> > Hi all,
> >
> > Within Nifi 1.10.0 we're seeing unexpected behavior with mergecontent.
> The processor is being fed in many flowfiles with individual JSON records.
> The records have various field types including a hex-encoded byte[]. We are
> not trying to merge JSON records themselves but rather consolidate many
> flowfiles into fewer flowfiles.
> >
> > What we're seeing is that a random flowfile is split causing the merge
> file to be invalid JSON. When running multiple bins we saw the flowfile
> split across bins.
> >
> > Example
> > Flowfile 1: {"name": "1", "hexbytes": A10F15B11D14", timestamp:
> "123456789" }
> > Flowfile 2:  {"name": "2", "hexbytes": A10F15D14B11", timestamp:
> "123456790" }
> > Flowfile 3:  {"name": "3", "hexbytes": A10F15D14B11", timestamp:
> "123456790" }
> >
> > Merged Result:
> > {"name": "1", "hexbyters": A10F15B11D14", timestamp: "123456789" }
> > xbytes": A10F15D14B11", timestamp: "123456790" }
> > {"name": "3", "hexbytes": A10F15D14B11", timestamp: "123456790" }
> > {"name": "3", "h
> >
> > Mergecontent Configuration:
> > Concurrent Tasks: 4
> > Merge Strategy: Bin-Packing Algorithm
> > Merge Format: Binary Concatenation
> > Attribute Strategy: Keep Only Common Attributes
> > Min. number of entries 1000
> > Max number of entries: 20000
> > Minimum group size: 10 KB
> > Maximum number of bins: 5
> > Header, Footer, and Demaractor are not set.
> >
> > We then backed off the below to reduce min and max entries, bin to 1,
> and thread to 1 and still see the same issue.
> >
> > Any insights?
> >
> > Thanks,
> > Jason
>
>

Mime
View raw message