spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-18658) Writing to a text DataSource buffers one or more lines in memory
Date Fri, 02 Dec 2016 05:41:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin resolved SPARK-18658.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

> Writing to a text DataSource buffers one or more lines in memory
> ----------------------------------------------------------------
>
>                 Key: SPARK-18658
>                 URL: https://issues.apache.org/jira/browse/SPARK-18658
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Nathan Howell
>            Assignee: Nathan Howell
>            Priority: Minor
>             Fix For: 2.2.0
>
>
> The JSON and CSV writing paths buffer entire lines (or multiple lines) in memory prior
to writing to disk. For large rows this is inefficient. It may make sense to skip the {{TextOutputFormat}}
record writer and go directly to the underlying {{FSDataOutputStream}}, allowing the writers
to append arbitrary byte arrays (fractions of a row) instead of a full row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message