spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)
Date Tue, 04 Oct 2016 22:02:55 GMT
>
> I don't quite understand why exposing it indirectly through a typed
> interface should be delayed before finalizing the API.
>

Spark has a long history
<https://spark-project.atlassian.net/browse/SPARK-1094> of maintaining
binary compatibility in its public APIs.  I strongly believe this is one of
the things that has made the project successful.  Exposing internals that
we know are going to change in the primary user facing API for creating
Streaming DataFrames seems directly counter to this goal.  I think the
argument that "you can do it anyway" fails to capture user expectations who
probably aren't closely following this discussion.

If advanced users want to dig though the code and experiment, great.  I
hope they report back on whats good and what can be improved.  However, if
you add the function suggested in the PR to DataStreamReader, you are
giving them a bad experience by leaking internals that don't even show up
in the published documentation.

Mime
View raw message