flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2170) Add OrcTableSource
Date Tue, 21 Nov 2017 21:12:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261489#comment-16261489

ASF GitHub Bot commented on FLINK-2170:

GitHub user fhueske opened a pull request:


    [FLINK-2170] [connectors] Add OrcRowInputFormat and OrcTableSource.

    ## What is the purpose of the change
    * Adds `OrcRowInputFormat` to read [ORC files](https://orc.apache.org) as `DataSet<Row>`.
The input format supports projection and filter push-down.
    * Adds `OrcTableSource` to read [ORC files](https://orc.apache.org) as a `Table` in a
batch Table API or SQL query. The table source supports projection and filter push-down.
    ## Brief change log
    * Creates a new module `flink-connectors/flink-orc`
    * Add `OrcRowInputFormat`
    * Add `OrcTableSource`
    * Add tests for input format and table source
    * Adjust cost model of batch table scans to favor table sources with pushed-down filters
over those without pushed-down filters. 
    * Add static method to `RowTypeInfo` to project on fields.
    * Improve translation of literals in `RexProgramExtractor`
    ## Verifying this change
    * `OrcRowInputFormatTest` verifies 
      * Correct configuration of ORC readers.
      * Results when reading ORC files
      * Schema evolution support
      * Computation of split boundaries
    * `OrcTableSourceTest` verifies
      * Correct implementation of TableSource interface methods
      * Correct configuration of `OrcRowInputFormat` for test queries (predicate and filter
    * `OrcTableSourceITCase` runs end-to-end tests with SQL queries.
    ## Does this pull request potentially affect one of the following parts:
      - Dependencies (does it add or upgrade a dependency): **yes**, adds a new Maven module
`flink-orc` with a dependency on `org.apache.orc/orc-core`
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
      - The serializers: **no**
      - The runtime per-record code paths (performance sensitive): **no**
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing,
Yarn/Mesos, ZooKeeper: **no**
      - The S3 file system connector: **no**
    ## Documentation
      - Does this pull request introduce a new feature? **yes**
      - If yes, how is the feature documented? **yes**, documentation for `RowTableSource`
was added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink table-ORC

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5043
commit 2f524dfa0c4f8468691151925a622ba7fee55f0f
Author: uybhatti <uybhatti@gmail.com>
Date:   2017-03-03T22:55:22Z

    [FLINK-2170] [connectors] Add OrcRowInputFormat and OrcTableSource.

commit d80506e3785268f541457a69ade3118c634cf7e7
Author: Fabian Hueske <fhueske@apache.org>
Date:   2017-11-13T13:54:54Z

    [FLINK-2170] [connectors] Add OrcRowInputFormat and OrcTableSource.


> Add OrcTableSource
> ------------------
>                 Key: FLINK-2170
>                 URL: https://issues.apache.org/jira/browse/FLINK-2170
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>    Affects Versions: 0.9
>            Reporter: Fabian Hueske
>            Assignee: Usman Younas
>            Priority: Minor
>              Labels: starter
> Add a {{OrcTableSource}} to read data from an ORC file. The {{OrcTableSource}} should
implement the {{ProjectableTableSource}} (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849)

This message was sent by Atlassian JIRA

View raw message