flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Parquet example
Date Tue, 11 Nov 2014 11:44:35 GMT
First of all, split locality can make a huge difference.
It will also enable a tighter integration, API-wise as well for the
execution by pushing for example filters or projections directly into the
data source and therefore reduce the data to be read from the file system.

2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:

> Maybe this is a dumb question but could you explain me what are the
> benefits of a dedicated Flink IF vs the one available by default in Hadoop
> IF wrapper?
> Is it just because of data locality of task slots?
>
> On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <fhueske@apache.org>
> wrote:
>
>> Hi Flavio,
>>
>> I am not aware of a Flink InputFormat for Parquet. However, it should be
>> hopefully covered by the Hadoop IF wrapper.
>> A dedicated Flink IF would be great though, IMO.
>>
>> Best, Fabian
>>
>> 2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>
>>> Hi to all,
>>>
>>> I'd like to know whether Flink is able exploit Parquet format to read
>>> data efficiently from HDFS.
>>> Is there any example available?
>>>
>>> Bets,
>>> Flavio
>>>
>>
>>
>

Mime
View raw message