flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Lipkovich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6016) Newlines should be valid in quoted strings in CSV
Date Fri, 01 Sep 2017 16:57:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150827#comment-16150827

Mikhail Lipkovich commented on FLINK-6016:

Thank you for the reply Luke
For now FileInputFormat identifies splits using information about blocks, no data is actually
read. If I correctly understand you, the suggestion is to modify this reader so that it downloads
all blocks, parses it according to quoted newline characters and returns split boundaries.
Therefore the data will be traversed twice: once in a single thread for splits identification
and the second one for actual data processing. 
Probably I'm able to implement it but I think it would be better for me to implement few easier
tasks before diving into this one.
Please let me know if my understanding of your comment is wrong

> Newlines should be valid in quoted strings in CSV
> -------------------------------------------------
>                 Key: FLINK-6016
>                 URL: https://issues.apache.org/jira/browse/FLINK-6016
>             Project: Flink
>          Issue Type: Bug
>          Components: Batch Connectors and Input/Output Formats
>    Affects Versions: 1.2.0
>            Reporter: Luke Hutchison
> The RFC for the CSV format specifies that newlines are valid in quoted strings in CSV:
> https://tools.ietf.org/html/rfc4180
> However, when parsing a CSV file with Flink containing a newline, such as:
> {noformat}
> "3
> 4",5
> {noformat}
> you get this exception:
> {noformat}
> Line could not be parsed: '"3'
> Expect field types: class java.lang.String, class java.lang.String 
> {noformat}

This message was sent by Atlassian JIRA

View raw message