nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Manivannan <a...@arunma.com>
Subject ConvertCSVToAvro vs CSVReader - Value Delimiter
Date Sun, 24 Sep 2017 15:16:58 GMT
Hi,

The ConvertCSVToAvro processor have been having performance issues while
processing files which are more than a GB and I was suggested to use the
ConvertRecord that leverages the RecordReader and Writer. Did some tests
and they do perform well.

Strangely, the CSVReader doesn't accept unicode character as the value
delimiter - Control A  (\u0001) character is the delimiter of my CSV.

Did some analysis and I see that a minor change needs to be made on the
CSVUtils to unescape the delimiter, like what ConvertCSVToAvro does and
also modify the SingleCharacterValidator.

Please let me know if you believe this isn't an issue and there's a
workaround for this. Else, I am more than happy to raise an issue and
submit a PR for review.

Best Regards,
Arun

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message