spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walaa Eldin Moustafa <wa.moust...@gmail.com>
Subject Re: [DISCUSS] SPIP: FunctionCatalog
Date Tue, 16 Feb 2021 16:51:17 GMT
Ye Xianjin, Transport <https://github.com/linkedin/transport> is a UDF API
that can be used to write UDFs once and have them translated to native
versions of multiple engine UDFs, such as Spark, Presto, and Hive, or data
source UDFs such as Avro (others could be added). To get a feel of the UDF
API, you can take a look at this example
<https://github.com/linkedin/transport/blob/wmoustafa-api-v1/transportable-udfs-examples/transportable-udfs-example-udfs/src/main/java/com/linkedin/transport/examples/MapFromTwoArraysFunction.java>
(there
are other examples in the same folder). The API supports type validation
and inference of nested SQL types with generics, function overloading, and
SQL to Java type mapping, with Java types that are easy to work with. The
API also results in efficient engine-specific (or data-source-specific)
UDFs. The efficiency stems from the fact that the output Spark, Hive,
Presto, and Avro UDFs are native to those engines/data sources. i.e. the
output UDF will seem to the engine as if it was written in its native
engine API, but also without complex data serialization or deserialization
(i.e., engine's or datasource's own data model/in-memory format is used
directly in the output UDF). More details about the architecture can be
found in the Transport blog post
<https://engineering.linkedin.com/blog/2018/11/using-translatable-portable-UDFs>.
Transport also powers Coral
<https://engineering.linkedin.com/blog/2020/coral> to support full SQL
interoperability between engines and data sources.

Thanks,
Walaa.


On Mon, Feb 15, 2021 at 10:23 PM Ye Xianjin <advancedxy@gmail.com> wrote:

> Hi,
>
> Thanks for Ryan and Wenchen for leading this.
>
> I’d like to add my two cents here. In production environments, the
> function catalog might be used by multiple systems, such as Spark, Presto
> and Hive.  Is it possible that this function catalog is designed with as an
> unified function catalog in mind, or at least it wouldn’t that difficult to
> extend this catalog as an unified one.
>
> P.S. We registered a lot of UDFs in hive HMS in our production
> environment, and those UDFs are shared by Spark and Presto. It works well
> even through with a lot of drawbacks.
>
> Sent from my iPhone
>
> On Feb 16, 2021, at 2:44 AM, Ryan Blue <rblue@netflix.com.invalid> wrote:
>
> 
> Thanks for the positive feedback, everyone. It sounds like there is a
> clear path forward for calling functions. Even without a prototype, the
> `invoke` plans show that Wenchen's suggested optimization can be done, and
> incorporating it as an optional extension to this proposal solves many of
> the unknowns.
>
> With that area now understood, is there any discussion about other parts
> of the proposal, besides the function call interface?
>
> On Fri, Feb 12, 2021 at 10:40 PM Chao Sun <sunchao@apache.org> wrote:
>
>> This is an important feature which can unblock several other projects
>> including bucket join support for DataSource v2, complete support for
>> enforcing DataSource v2 distribution requirements on the write path, etc. I
>> like Ryan's proposals which look simple and elegant, with nice support on
>> function overloading and variadic arguments. On the other hand, I think
>> Wenchen made a very good point about performance. Overall, I'm excited to
>> see active discussions on this topic and believe the community will come to
>> a proposal with the best of both sides.
>>
>> Chao
>>
>> On Fri, Feb 12, 2021 at 7:58 PM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>>
>>> +1 for Liang-chi's.
>>>
>>> Thanks Ryan and Wenchen for leading this.
>>>
>>>
>>> 2021년 2월 13일 (토) 오후 12:18, Liang-Chi Hsieh <viirya@gmail.com>님이
작성:
>>>
>>>> Basically I think the proposal makes sense to me and I'd like to
>>>> support the
>>>> SPIP as it looks like we have strong need for the important feature.
>>>>
>>>> Thanks Ryan for working on this and I do also look forward to Wenchen's
>>>> implementation. Thanks for the discussion too.
>>>>
>>>> Actually I think the SupportsInvoke proposed by Ryan looks a good
>>>> alternative to me. Besides Wenchen's alternative implementation, is
>>>> there a
>>>> chance we also have the SupportsInvoke for comparison?
>>>>
>>>>
>>>> John Zhuge wrote
>>>> > Excited to see our Spark community rallying behind this important
>>>> feature!
>>>> >
>>>> > The proposal lays a solid foundation of minimal feature set with
>>>> careful
>>>> > considerations for future optimizations and extensions. Can't wait to
>>>> see
>>>> > it leading to more advanced functionalities like views with shared
>>>> custom
>>>> > functions, function pushdown, lambda, etc. It has already borne fruit
>>>> from
>>>> > the constructive collaborations in this thread. Looking forward to
>>>> > Wenchen's prototype and further discussions including the
>>>> SupportsInvoke
>>>> > extension proposed by Ryan.
>>>> >
>>>> >
>>>> > On Fri, Feb 12, 2021 at 4:35 PM Owen O'Malley &lt;
>>>>
>>>> > owen.omalley@
>>>>
>>>> > &gt;
>>>> > wrote:
>>>> >
>>>> >> I think this proposal is a very good thing giving Spark a standard
>>>> way of
>>>> >> getting to and calling UDFs.
>>>> >>
>>>> >> I like having the ScalarFunction as the API to call the UDFs. It
is
>>>> >> simple, yet covers all of the polymorphic type cases well. I think
it
>>>> >> would
>>>> >> also simplify using the functions in other contexts like pushing
down
>>>> >> filters into the ORC & Parquet readers although there are a
lot of
>>>> >> details
>>>> >> that would need to be considered there.
>>>> >>
>>>> >> .. Owen
>>>> >>
>>>> >>
>>>> >> On Fri, Feb 12, 2021 at 11:07 PM Erik Krogen &lt;
>>>>
>>>> > ekrogen@.com
>>>>
>>>> > &gt;
>>>> >> wrote:
>>>> >>
>>>> >>> I agree that there is a strong need for a FunctionCatalog within
>>>> Spark
>>>> >>> to
>>>> >>> provide support for shareable UDFs, as well as make movement
towards
>>>> >>> more
>>>> >>> advanced functionality like views which themselves depend on
UDFs,
>>>> so I
>>>> >>> support this SPIP wholeheartedly.
>>>> >>>
>>>> >>> I find both of the proposed UDF APIs to be sufficiently
>>>> user-friendly
>>>> >>> and
>>>> >>> extensible. I generally think Wenchen's proposal is easier for
a
>>>> user to
>>>> >>> work with in the common case, but has greater potential for
>>>> confusing
>>>> >>> and
>>>> >>> hard-to-debug behavior due to use of reflective method signature
>>>> >>> searches.
>>>> >>> The merits on both sides can hopefully be more properly examined
>>>> with
>>>> >>> code,
>>>> >>> so I look forward to seeing an implementation of Wenchen's ideas
to
>>>> >>> provide
>>>> >>> a more concrete comparison. I am optimistic that we will not
let the
>>>> >>> debate
>>>> >>> over this point unreasonably stall the SPIP from making progress.
>>>> >>>
>>>> >>> Thank you to both Wenchen and Ryan for your detailed consideration
>>>> and
>>>> >>> evaluation of these ideas!
>>>> >>> ------------------------------
>>>> >>> *From:* Dongjoon Hyun &lt;
>>>>
>>>> > dongjoon.hyun@
>>>>
>>>> > &gt;
>>>> >>> *Sent:* Wednesday, February 10, 2021 6:06 PM
>>>> >>> *To:* Ryan Blue &lt;
>>>>
>>>> > blue@
>>>>
>>>> > &gt;
>>>> >>> *Cc:* Holden Karau &lt;
>>>>
>>>> > holden@
>>>>
>>>> > &gt;; Hyukjin Kwon <
>>>> >>>
>>>>
>>>> > gurwls223@
>>>>
>>>> >>; Spark Dev List &lt;
>>>>
>>>> > dev@.apache
>>>>
>>>> > &gt;; Wenchen Fan
>>>> >>> &lt;
>>>>
>>>> > cloud0fan@
>>>>
>>>> > &gt;
>>>> >>> *Subject:* Re: [DISCUSS] SPIP: FunctionCatalog
>>>> >>>
>>>> >>> BTW, I forgot to add my opinion explicitly in this thread because
I
>>>> was
>>>> >>> on the PR before this thread.
>>>> >>>
>>>> >>> 1. The `FunctionCatalog API` PR was made on May 9, 2019 and
has been
>>>> >>> there for almost two years.
>>>> >>> 2. I already gave my +1 on that PR last Saturday because I agreed
>>>> with
>>>> >>> the latest updated design docs and AS-IS PR.
>>>> >>>
>>>> >>> And, the rest of the progress in this thread is also very
>>>> satisfying to
>>>> >>> me.
>>>> >>> (e.g. Ryan's extension suggestion and Wenchen's alternative)
>>>> >>>
>>>> >>> To All:
>>>> >>> Please take a look at the design doc and the PR, and give us
some
>>>> >>> opinions.
>>>> >>> We really need your participation in order to make DSv2 more
>>>> complete.
>>>> >>> This will unblock other DSv2 features, too.
>>>> >>>
>>>> >>> Bests,
>>>> >>> Dongjoon.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Feb 10, 2021 at 10:58 AM Dongjoon Hyun &lt;
>>>>
>>>> > dongjoon.hyun@
>>>>
>>>> > &gt;
>>>> >>> wrote:
>>>> >>>
>>>> >>> Hi, Ryan.
>>>> >>>
>>>> >>> We didn't move past anything (both yours and Wenchen's). What
>>>> Wenchen
>>>> >>> suggested is double-checking the alternatives with the
>>>> implementation to
>>>> >>> give more momentum to our discussion.
>>>> >>>
>>>> >>> Your new suggestion about optional extention also sounds like
a new
>>>> >>> reasonable alternative to me.
>>>> >>>
>>>> >>> We are still discussing this topic together and I hope we can
make a
>>>> >>> conclude at this time (for Apache Spark 3.2) without being stucked
>>>> like
>>>> >>> last time.
>>>> >>>
>>>> >>> I really appreciate your leadership in this dicsussion and the
>>>> moving
>>>> >>> direction of this discussion looks constructive to me. Let's
give
>>>> some
>>>> >>> time
>>>> >>> to the alternatives.
>>>> >>>
>>>> >>> Bests,
>>>> >>> Dongjoon.
>>>> >>>
>>>> >>> On Wed, Feb 10, 2021 at 10:14 AM Ryan Blue &lt;
>>>>
>>>> > blue@
>>>>
>>>> > &gt; wrote:
>>>> >>>
>>>> >>> I don’t think we should so quickly move past the drawbacks
of this
>>>> >>> approach. The problems are significant enough that using invoke
is
>>>> not
>>>> >>> sufficient on its own. But, I think we can add it as an optional
>>>> >>> extension
>>>> >>> to shore up the weaknesses.
>>>> >>>
>>>> >>> Here’s a summary of the drawbacks:
>>>> >>>
>>>> >>>    - Magic function signatures are error-prone
>>>> >>>    - Spark would need considerable code to help users find what
went
>>>> >>>    wrong
>>>> >>>    - Spark would likely need to coerce arguments (e.g., String,
>>>> >>>    Option[Int]) for usability
>>>> >>>    - It is unclear how Spark will find the Java Method to call
>>>> >>>    - Use cases that require varargs fall back to casting; users
will
>>>> >>>    also get this wrong (cast to String instead of UTF8String)
>>>> >>>    - The non-codegen path is significantly slower
>>>> >>>
>>>> >>> The benefit of invoke is to avoid moving data into a row, like
this:
>>>> >>>
>>>> >>> -- using invoke
>>>> >>> int result = udfFunction(x, y)
>>>> >>>
>>>> >>> -- using row
>>>> >>> udfRow.update(0, x); -- actual: values[0] = x;
>>>> >>> udfRow.update(1, y);
>>>> >>> int result = udfFunction(udfRow);
>>>> >>>
>>>> >>> And, again, that won’t actually help much in cases that require
>>>> varargs.
>>>> >>>
>>>> >>> I suggest we add a new marker trait for BoundMethod called
>>>> >>> SupportsInvoke.
>>>> >>> If that interface is implemented, then Spark will look for a
method
>>>> that
>>>> >>> matches the expected signature based on the bound input type.
If it
>>>> >>> isn’t
>>>> >>> found, Spark can print a warning and fall back to the InternalRow
>>>> call:
>>>> >>> “Cannot find udfFunction(int, int)”.
>>>> >>>
>>>> >>> This approach allows the invoke optimization, but solves many
of the
>>>> >>> problems:
>>>> >>>
>>>> >>>    - The method to invoke is found using the proposed load and
bind
>>>> >>>    approach
>>>> >>>    - Magic function signatures are optional and do not cause
runtime
>>>> >>>    failures
>>>> >>>    - Because this is an optional optimization, Spark can be
more
>>>> strict
>>>> >>>    about types
>>>> >>>    - Varargs cases can still use rows
>>>> >>>    - Non-codegen can use an evaluation method rather than falling
>>>> back
>>>> >>>    to slow Java reflection
>>>> >>>
>>>> >>> This seems like a good extension to me; this provides a plan
for
>>>> >>> optimizing the UDF call to avoid building a row, while the existing
>>>> >>> proposal covers the other cases well and addresses how to locate
>>>> these
>>>> >>> function calls.
>>>> >>>
>>>> >>> This also highlights that the approach used in DSv2 and this
>>>> proposal is
>>>> >>> working: start small and use extensions to layer on more complex
>>>> >>> support.
>>>> >>>
>>>> >>> On Wed, Feb 10, 2021 at 9:04 AM Dongjoon Hyun &lt;
>>>>
>>>> > dongjoon.hyun@
>>>>
>>>> > &gt;
>>>> >>> wrote:
>>>> >>>
>>>> >>> Thank you all for making a giant move forward for Apache Spark
>>>> 3.2.0.
>>>> >>> I'm really looking forward to seeing Wenchen's implementation.
>>>> >>> That would be greatly helpful to make a decision!
>>>> >>>
>>>> >>> > I'll implement my idea after the holiday and then we can
have
>>>> >>> more effective discussions. We can also do benchmarks and get
some
>>>> real
>>>> >>> numbers.
>>>> >>> > FYI: the Presto UDF API
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprestodb.io%2Fdocs%2Fcurrent%2Fdevelop%2Ffunctions.html&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060067978066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iMWmHqqXPcT7EK%2Bovyzhy%2BZpU6Llih%2BwdZD53wvobmc%3D&amp;reserved=0&gt
>>>> ;
>>>> >>> also
>>>> >>> takes individual parameters instead of the row parameter. I
think
>>>> this
>>>> >>> direction at least worth a try so that we can see the performance
>>>> >>> difference. It's also mentioned in the design doc as an alternative
>>>> >>> (Trino).
>>>> >>>
>>>> >>> Bests,
>>>> >>> Dongjoon.
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Feb 9, 2021 at 10:18 PM Wenchen Fan &lt;
>>>>
>>>> > cloud0fan@
>>>>
>>>> > &gt; wrote:
>>>> >>>
>>>> >>> FYI: the Presto UDF API
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprestodb.io%2Fdocs%2Fcurrent%2Fdevelop%2Ffunctions.html&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060067988024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZSBCR7yx3PpwL4KY9V73JG42Z02ZodqkjxC0LweHt1g%3D&amp;reserved=0&gt
>>>> ;
>>>> >>> also takes individual parameters instead of the row parameter.
I
>>>> think
>>>> >>> this
>>>> >>> direction at least worth a try so that we can see the performance
>>>> >>> difference. It's also mentioned in the design doc as an alternative
>>>> >>> (Trino).
>>>> >>>
>>>> >>> On Wed, Feb 10, 2021 at 10:18 AM Wenchen Fan &lt;
>>>>
>>>> > cloud0fan@
>>>>
>>>> > &gt; wrote:
>>>> >>>
>>>> >>> Hi Holden,
>>>> >>>
>>>> >>> As Hyukjin said, following existing designs is not the principle
of
>>>> DS
>>>> >>> v2
>>>> >>> API design. We should make sure the DS v2 API makes sense. AFAIK
we
>>>> >>> didn't
>>>> >>> fully follow the catalog API design from Hive and I believe
Ryan
>>>> also
>>>> >>> agrees with it.
>>>> >>>
>>>> >>> I think the problem here is we were discussing some very detailed
>>>> things
>>>> >>> without actual code. I'll implement my idea after the holiday
and
>>>> then
>>>> >>> we
>>>> >>> can have more effective discussions. We can also do benchmarks
and
>>>> get
>>>> >>> some
>>>> >>> real numbers.
>>>> >>>
>>>> >>> In the meantime, we can continue to discuss other parts of this
>>>> >>> proposal,
>>>> >>> and make a prototype if possible. Spark SQL has many active
>>>> >>> contributors/committers and this thread doesn't get much attention
>>>> yet.
>>>> >>>
>>>> >>> On Wed, Feb 10, 2021 at 6:17 AM Hyukjin Kwon &lt;
>>>>
>>>> > gurwls223@
>>>>
>>>> > &gt; wrote:
>>>> >>>
>>>> >>> Just dropping a few lines. I remember that one of the goals
in DSv2
>>>> is
>>>> >>> to
>>>> >>> correct the mistakes we made in the current Spark codes.
>>>> >>> It would not have much point if we will happen to just follow
and
>>>> mimic
>>>> >>> what Spark currently does. It might just end up with another
copy of
>>>> >>> Spark
>>>> >>> APIs, e.g. Expression (internal) APIs. I sincerely would like
to
>>>> avoid
>>>> >>> this
>>>> >>> I do believe we have been stuck mainly due to trying to come
up
>>>> with a
>>>> >>> better design. We already have an ugly picture of the current
Spark
>>>> APIs
>>>> >>> to
>>>> >>> draw a better bigger picture.
>>>> >>>
>>>> >>>
>>>> >>> 2021년 2월 10일 (수) 오전 3:28, Holden Karau &lt;
>>>>
>>>> > holden@
>>>>
>>>> > &gt;님이 작성:
>>>> >>>
>>>> >>> I think this proposal is a good set of trade-offs and has existed
>>>> in the
>>>> >>> community for a long period of time. I especially appreciate
how the
>>>> >>> design
>>>> >>> is focused on a minimal useful component, with future optimizations
>>>> >>> considered from a point of view of making sure it's flexible,
but
>>>> actual
>>>> >>> concrete decisions left for the future once we see how this
API is
>>>> used.
>>>> >>> I
>>>> >>> think if we try and optimize everything right out of the gate,
we'll
>>>> >>> quickly get stuck (again) and not make any progress.
>>>> >>>
>>>> >>> On Mon, Feb 8, 2021 at 10:46 AM Ryan Blue &lt;
>>>>
>>>> > blue@
>>>>
>>>> > &gt; wrote:
>>>> >>>
>>>> >>> Hi everyone,
>>>> >>>
>>>> >>> I'd like to start a discussion for adding a FunctionCatalog
>>>> interface to
>>>> >>> catalog plugins. This will allow catalogs to expose functions
to
>>>> Spark,
>>>> >>> similar to how the TableCatalog interface allows a catalog to
expose
>>>> >>> tables. The proposal doc is available here:
>>>> >>>
>>>> https://docs.google.com/document/d/1PLBieHIlxZjmoUB0ERF-VozCRJ0xw2j3qKvUNWpWA2U/edit
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1PLBieHIlxZjmoUB0ERF-VozCRJ0xw2j3qKvUNWpWA2U%2Fedit&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060067988024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Kyth8%2FhNUZ6GXG2FsgcknZ7t7s0%2BpxnDMPyxvsxLLqE%3D&amp;reserved=0&gt
>>>> ;
>>>> >>>
>>>> >>> Here's a high-level summary of some of the main design choices:
>>>> >>> * Adds the ability to list and load functions, not to create
or
>>>> modify
>>>> >>> them in an external catalog
>>>> >>> * Supports scalar, aggregate, and partial aggregate functions
>>>> >>> * Uses load and bind steps for better error messages and simpler
>>>> >>> implementations
>>>> >>> * Like the DSv2 table read and write APIs, it uses InternalRow
to
>>>> pass
>>>> >>> data
>>>> >>> * Can be extended using mix-in interfaces to add vectorization,
>>>> codegen,
>>>> >>> and other future features
>>>> >>>
>>>> >>> There is also a PR with the proposed API:
>>>> >>> https://github.com/apache/spark/pull/24559/files
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F24559%2Ffiles&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060067988024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=t3ZCqffdsrmCY3X%2FT8x1oMjMcNUiQ0wQNk%2ByAXQx1Io%3D&amp;reserved=0&gt
>>>> ;
>>>> >>>
>>>> >>> Let's discuss the proposal here rather than on that PR, to get
>>>> better
>>>> >>> visibility. Also, please take the time to read the proposal
first.
>>>> That
>>>> >>> really helps clear up misconceptions.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Ryan Blue
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Twitter: https://twitter.com/holdenkarau
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060067997978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fVfSPIyazuUYv8VLfNu%2BUIHdc3ePM1AAKKH%2BlnIicF8%3D&amp;reserved=0&gt
>>>> ;
>>>> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>> >>> https://amzn.to/2MaRAG9
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060067997978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NbRl9kK%2B6Wy0jWmDnztYp3JCPNLuJvmFsLHUrXzEhlk%3D&amp;reserved=0&gt
>>>> ;
>>>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> >>> &lt;
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&amp;data=04%7C01%7Cekrogen%40linkedin.com%7C0ccf8c15abd74dfc974f08d8ce31ae4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637486060068007935%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OWXOBELzO3hBa2JI%2FOSBZ3oNyLq0yr%2FGXMkNn7bqYDM%3D&amp;reserved=0&gt
>>>> ;
>>>> >>>
>>>> >>> --
>>>> >>> Ryan Blue
>>>> >>>
>>>> >>>
>>>> >
>>>> > --
>>>> > John Zhuge
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>

Mime
View raw message