spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: spark disk-to-disk
Date Mon, 23 Mar 2015 20:30:40 GMT
Maybe implement a very simple function that uses the Hadoop API to read in
based on file names (i.e. parts)?

On Mon, Mar 23, 2015 at 10:55 AM, Koert Kuipers <koert@tresata.com> wrote:

> there is a way to reinstate the partitioner, but that requires
> sc.objectFile to read exactly what i wrote, which means sc.objectFile
> should never split files on reading (a feature of hadoop file inputformat
> that gets in the way here).
>
> On Mon, Mar 23, 2015 at 1:39 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> i just realized the major limitation is that i lose partitioning info...
>>
>> On Mon, Mar 23, 2015 at 1:34 AM, Reynold Xin <rxin@databricks.com> wrote:
>>
>>>
>>> On Sun, Mar 22, 2015 at 6:03 PM, Koert Kuipers <koert@tresata.com>
>>> wrote:
>>>
>>>> so finally i can resort to:
>>>> rdd.saveAsObjectFile(...)
>>>> sc.objectFile(...)
>>>> but that seems like a rather broken abstraction.
>>>>
>>>>
>>> This seems like a fine solution to me.
>>>
>>>
>>
>

Mime
View raw message