flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xingcan Cui (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-10684) Improve the CSV reading process
Date Fri, 26 Oct 2018 04:21:00 GMT
Xingcan Cui created FLINK-10684:

             Summary: Improve the CSV reading process
                 Key: FLINK-10684
                 URL: https://issues.apache.org/jira/browse/FLINK-10684
             Project: Flink
          Issue Type: Improvement
          Components: Core
            Reporter: Xingcan Cui

CSV is one of the most commonly used file formats in data wrangling. To load records from
CSV files, Flink has provided the basic {{CsvInputFormat}}, as well as some variants (e.g.,
{{RowCsvInputFormat}} and {{PojoCsvInputFormat}}). However, it seems that the reading process
can be improved. For example, we could add a built-in util to automatically infer schemas
from CSV headers and samples of data. Also, the current bad record handling method can be
improved by somehow keeping the invalid lines (and even the reasons for failed parsing), instead
of logging the total number only.

This is an umbrella issue for all the improvements and bug fixes for the CSV reading process.

This message was sent by Atlassian JIRA

View raw message