nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Manivannan <a...@arunma.com>
Subject Re: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter
Date Mon, 25 Sep 2017 14:39:52 GMT
Hi All,

Just raised a PR (https://github.com/apache/nifi/pull/2172) for JIRA
NIFI-4416 <https://issues.apache.org/jira/browse/NIFI-4416>

Appreciate your help, Peter and Matt.  Could you please have a quick look
and give your comments.

Joe - Could you also check out the JIRA and let me know if I've committed
some crime.

You guys are the best !

Best Regards,
Arun

On Mon, Sep 25, 2017 at 9:44 AM Arun Manivannan <arun@arunma.com> wrote:

> Thanks a lot, gentlemen. JIRA and PR coming through in a few hours.
>
> On Mon, Sep 25, 2017, 09:07 Matt Burgess <mattyb149@gmail.com> wrote:
>
>> Thanks all, if the PR is available tomorrow I can review as well and
>> merge, but I will be on vacation for a week after that. No pressure :)
>>
>> Regards,
>> Matt
>>
>> > On Sep 24, 2017, at 8:57 PM, Joe Witt <joe.witt@gmail.com> wrote:
>> >
>> > Thanks Arun and Peter.  Getting that resolved will be nice.  The
>> > performance difference of the record reader/writer approach in all
>> > this is pretty fantastic so the more we can do to iron out these sorts
>> > of edges the better.  Thanks!
>> >
>> >> On Sun, Sep 24, 2017 at 8:56 PM, Peter Wicks (pwicks) <
>> pwicks@micron.com> wrote:
>> >> Arun,
>> >>
>> >> I'm also using Ctrl+A as a delimiter and had the same problem.  I
>> haven't had time to write up a PR but it looked like a pretty easy fix to
>> me too.
>> >>
>> >> I can't merge the change if you submit it, but I'd be happy to review
>> it.
>> >>
>> >> --Peter
>> >>
>> >> -----Original Message-----
>> >> From: Arun Manivannan [mailto:arun@arunma.com]
>> >> Sent: Sunday, September 24, 2017 11:17 PM
>> >> To: Dev@nifi.apache.org
>> >> Subject: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter
>> >>
>> >> Hi,
>> >>
>> >> The ConvertCSVToAvro processor have been having performance issues
>> while processing files which are more than a GB and I was suggested to use
>> the ConvertRecord that leverages the RecordReader and Writer. Did some
>> tests and they do perform well.
>> >>
>> >> Strangely, the CSVReader doesn't accept unicode character as the value
>> delimiter - Control A  (\u0001) character is the delimiter of my CSV.
>> >>
>> >> Did some analysis and I see that a minor change needs to be made on
>> the CSVUtils to unescape the delimiter, like what ConvertCSVToAvro does and
>> also modify the SingleCharacterValidator.
>> >>
>> >> Please let me know if you believe this isn't an issue and there's a
>> workaround for this. Else, I am more than happy to raise an issue and
>> submit a PR for review.
>> >>
>> >> Best Regards,
>> >> Arun
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message