spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <>
Subject Re: Data source V2 in spark 2.4.0
Date Mon, 01 Oct 2018 17:11:20 GMT
Hi Assaf,
The major changes to the V2 API that you linked to aren’t going into 2.4.
Those will be in the next release because they weren’t finished in time for

Here are the major updates that will be in 2.4:

   - SPARK-23323 <>: The
   output commit coordinator is used by default to ensure only one attempt of
   each task commits.
   - SPARK-23325 <> and
   SPARK-24971 <>: Readers
   should always produce InternalRow instead of Row or UnsafeRow; see
   SPARK-23325 for detail.
   - SPARK-24990 <>:
   ReadSupportWithSchema was removed, the user-supplied schema option was
   added to ReadSupport.
   - SPARK-24073 <>: Read
   splits are now called InputPartition and a few methods were also renamed
   for clarity.
   - SPARK-25127 <>:
   SupportsPushDownCatalystFilters was removed because it leaked Expression in
   the public API. V2 always uses the Filter API now.
   - SPARK-24478 <>: Push
   down is now done when converting the a physical plan.

I think there are also quite a few updates for the streaming side, but I’m
not as familiar with those so I’ll let someone else jump in with a summary.


On Mon, Oct 1, 2018 at 9:51 AM assaf.mendelson <>

> Hi all,
> I understood from previous threads that the Data source V2 API will see
> some
> changes in spark 2.4.0, however, I can't seem to find what these changes
> are.
> Is there some documentation which summarizes the changes?
> The only mention I seem to find is this pull request:
> Is this all of it?
> Thanks,
>     Assaf.
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

Ryan Blue
Software Engineer

View raw message