crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muhammad (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-564) Add support for using escape character same as open/close quote character
Date Tue, 29 Sep 2015 21:43:06 GMT


Muhammad commented on CRUNCH-564:

It appears to work with \ as escape character. Ill update if I face issues.

On configuration options - I thought you mandated to provide everything, because if I do not
provide CSV_BUFFER_SIZE it crashes with NPE, following is the code snippet that fails. 

  final String bufferValue = this.configuration.get(CSVFileSource.CSV_BUFFER_SIZE);
    if ("".equals(bufferValue)) {
      bufferSize = CSVLineReader.DEFAULT_BUFFER_SIZE;
    } else {
      bufferSize = Integer.parseInt(bufferValue);

And If I do not provide CSV_INPUT_FILE_ENCODING it crashes also both because 
{code}  this.configuration.get(CSVFileSource.CSV_INPUT_FILE_ENCODING/CSV_BUFFER_SIZE)
is returning a *null* and not empty string making it go in the *else* clause..

I'm using {code}org.apache.mrunit:mrunit:1.1.0:hadoop2{code} and 

> Add support for using escape character same as open/close quote character
> -------------------------------------------------------------------------
>                 Key: CRUNCH-564
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Muhammad
>            Assignee: Josh Wills
>            Priority: Trivial
>              Labels: csv, csvparser
> As a user I would like to use CSVInputFormat to handle the CSV files following this RFC
> Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs.
The method escapes the CSV following the RFC4180. 
> The CSVLineReader throws exception in such a case. We can enhance the code to support
the CSVs that use escape same as the quote characters.
> I would appreciate a comment, if someone has knowingly rejected the idea due to some
technical limitation or a problem with allowing escape and quote as same characters. By the
way Apache HAWQ seem to get around this issue somehow and reads such CSVs alright.

This message was sent by Atlassian JIRA

View raw message