I built this recently using the accepted answer on this SO page:
http://stackoverflow.com/questions/26741714/how-does-the-pyspark-mappartitions-function-work/26745371
-sujit
On Sat, May 14, 2016 at 7:00 AM, Mathieu Longtin <mathieu@closetwork.org>
wrote:
> From memory:
> def processor(iterator):
> for item in iterator:
> newitem = do_whatever(item)
> yield newitem
>
> newdata = data.mapPartition(processor)
>
> Basically, your function takes an iterator as an argument, and must either
> be an iterator or return one.
>
> On Sat, May 14, 2016 at 12:39 AM Abi <analyst.tech.jobs@gmail.com> wrote:
>
>>
>>
>> On Tue, May 10, 2016 at 2:20 PM, Abi <analyst.tech.jobs@gmail.com> wrote:
>>
>>> Is there any example of this ? I want to see how you write the the
>>> iterable example
>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>
|