parquet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe L. Korn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PARQUET-1022) [C++] Append mode in parquet-cpp
Date Wed, 13 Mar 2019 09:24:00 GMT

    [ https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791492#comment-16791492
] 

Uwe L. Korn commented on PARQUET-1022:
--------------------------------------

There is no implementation of merging concatenating files in C++ yet but that would be much
easier to implement than an Append mode. For merging files you would read the footer of all
files, copy the binary content of the RowGroups into the new file and then compute a new footer
from the information of existing footer. From what I can think of currently, this should not
require decoding any RowGroups thus implementing an explicit merge will be a lot faster then materialising the
data first and writing a fully new file.

> [C++] Append mode in parquet-cpp
> --------------------------------
>
>                 Key: PARQUET-1022
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1022
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-cpp
>    Affects Versions: cpp-1.1.0
>            Reporter: yugu
>            Assignee: Wes McKinney
>            Priority: Major
>
> As said, currently trying to work out a append feature for parquet files in c++.
> (been searching through repo etc, can't find example tho..)
> Current solution is to (assume no schema changes that is):
> Read in metadata
> Change metadata based on appended rows+ original rows
> Append a new row group (or multiple row group writer)
> Write the new rows.
> ---
> The problem is that, is approached this way, the original last row group may not be complete
filled. Was wondering if there is a fix or I'm using the api wrong...
> Thanks ! : D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message