spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DB Tsai <...@netflix.com.INVALID>
Subject Re: Will higher order functions in spark SQL be pushed upstream?
Date Tue, 10 Oct 2017 07:33:12 GMT
Hello,

At Netflix's algorithm team, we work on ranking problems a lot where
we naturally deal with the dataset with nested list of the structs. We
built Scala APIs like map, filter, drop, withColumn that can work on
the nested list of structs efficiently using SQL expression with
codegen.

Here is what we purpose on how APIs will look like, and we would like
to socialize with community to get more feedback!

https://issues.apache.org/jira/browse/SPARK-22231

It will be cool to share some building blocks with Databricks's higher
order function feature.

Thanks.

On Fri, Jun 9, 2017 at 5:04 PM, Antoine HOM <antoine.hom@gmail.com> wrote:
> Good news :) Thx Sameer.
>
>
> On Friday, June 9, 2017, Sameer Agarwal <sameer@databricks.com> wrote:
>>>
>>> * As a heavy user of complex data types I was wondering if there was
>>> any plan to push those changes upstream?
>>
>>
>> Yes, we intend to contribute this to open source.
>>
>>>
>>> * In addition, I was wondering if as part of this change it also tries
>>> to solve the column pruning / filter pushdown issues with complex
>>> datatypes?
>>
>>
>> For parquet, this effort is primarily tracked via SPARK-4502 (see
>> https://github.com/apache/spark/pull/16578) and is currently targeted for
>> 2.3.

-- 
Sincerely,

DB Tsai
----------------------------------------------------------
PGP Key ID: 0x5CED8B896A6BDFA0

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message