spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Structured streaming use of DataFrame vs Datasource
Date Thu, 16 Jun 2016 19:38:29 GMT
Is this really an internal / external distinction?

For a concrete example, Source.getBatch seems to be a public
interface, but returns DataFrame.

On Thu, Jun 16, 2016 at 1:42 PM, Tathagata Das
<tathagata.das1565@gmail.com> wrote:
> DataFrame is a type alias of Dataset[Row], so externally it seems like
> Dataset is the main type and DataFrame is a derivative type.
> However, internally, since everything is processed as Rows, everything uses
> DataFrames, Type classes used in a Dataset is internally converted to rows
> for processing. . Therefore internally DataFrame is like "main" type that is
> used.
>
> On Thu, Jun 16, 2016 at 11:18 AM, Cody Koeninger <cody@koeninger.org> wrote:
>>
>> Sorry, meant DataFrame vs Dataset
>>
>> On Thu, Jun 16, 2016 at 12:53 PM, Cody Koeninger <cody@koeninger.org>
>> wrote:
>> > Is there a principled reason why sql.streaming.* and
>> > sql.execution.streaming.* are making extensive use of DataFrame
>> > instead of Datasource?
>> >
>> > Or is that just a holdover from code written before the move / type
>> > alias?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message