nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Manivannan <a...@arunma.com>
Subject Re: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter
Date Mon, 25 Sep 2017 01:44:40 GMT
Thanks a lot, gentlemen. JIRA and PR coming through in a few hours.

On Mon, Sep 25, 2017, 09:07 Matt Burgess <mattyb149@gmail.com> wrote:

> Thanks all, if the PR is available tomorrow I can review as well and
> merge, but I will be on vacation for a week after that. No pressure :)
>
> Regards,
> Matt
>
> > On Sep 24, 2017, at 8:57 PM, Joe Witt <joe.witt@gmail.com> wrote:
> >
> > Thanks Arun and Peter.  Getting that resolved will be nice.  The
> > performance difference of the record reader/writer approach in all
> > this is pretty fantastic so the more we can do to iron out these sorts
> > of edges the better.  Thanks!
> >
> >> On Sun, Sep 24, 2017 at 8:56 PM, Peter Wicks (pwicks) <
> pwicks@micron.com> wrote:
> >> Arun,
> >>
> >> I'm also using Ctrl+A as a delimiter and had the same problem.  I
> haven't had time to write up a PR but it looked like a pretty easy fix to
> me too.
> >>
> >> I can't merge the change if you submit it, but I'd be happy to review
> it.
> >>
> >> --Peter
> >>
> >> -----Original Message-----
> >> From: Arun Manivannan [mailto:arun@arunma.com]
> >> Sent: Sunday, September 24, 2017 11:17 PM
> >> To: Dev@nifi.apache.org
> >> Subject: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter
> >>
> >> Hi,
> >>
> >> The ConvertCSVToAvro processor have been having performance issues
> while processing files which are more than a GB and I was suggested to use
> the ConvertRecord that leverages the RecordReader and Writer. Did some
> tests and they do perform well.
> >>
> >> Strangely, the CSVReader doesn't accept unicode character as the value
> delimiter - Control A  (\u0001) character is the delimiter of my CSV.
> >>
> >> Did some analysis and I see that a minor change needs to be made on the
> CSVUtils to unescape the delimiter, like what ConvertCSVToAvro does and
> also modify the SingleCharacterValidator.
> >>
> >> Please let me know if you believe this isn't an issue and there's a
> workaround for this. Else, I am more than happy to raise an issue and
> submit a PR for review.
> >>
> >> Best Regards,
> >> Arun
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message