spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Genmao Yu <>
Subject Re: batch processing in spark
Date Mon, 06 May 2019 01:52:22 GMT
IIUC, you can use mapPartitions transformation and pass a function f. The function is used
to map a tuple of input iterator to  an output iterator. Upon the input iterator, you can
process multiple records at a time.

> 在 2019年5月6日,上午2:59,swastik mittal <> 写道:
> From my experience in spark, when working on hdfs data base, spark reads data
> in form of records and does computation on every record as soon as it reads
> it. I have multiple images as my data on hdfs, where each image is a record.
> I want spark to read multiple records before doing any computation. Any idea
> on how could I do this?
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message