spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Bryński <mac...@brynski.pl>
Subject Re: Spark 2.0 Performance drop
Date Thu, 30 Jun 2016 07:28:30 GMT
I filled up 2 Jira.
1) Performance when queries nested column
https://issues.apache.org/jira/browse/SPARK-16320

2) Pyspark performance
https://issues.apache.org/jira/browse/SPARK-16321

I found Jira for:
1) PPD on nested columns
https://issues.apache.org/jira/browse/SPARK-5151

2) Drop of support for df.map etc. in Pyspark
https://issues.apache.org/jira/browse/SPARK-13594

2016-06-30 0:47 GMT+02:00 Michael Allman <michael@videoamp.com>:
> The patch we use in production is for 1.5. We're porting the patch to master (and downstream
to 2.0, which is presently very similar) with the intention of submitting a PR "soon". We'll
push it here when it's ready: https://github.com/VideoAmp/spark-public.
>
> Regarding benchmarking, we have a suite of Spark SQL regression tests which we run to
check correctness and performance. I can share our findings when I have them.
>
> Cheers,
>
> Michael
>
>> On Jun 29, 2016, at 2:39 PM, Maciej Bryński <maciek@brynski.pl> wrote:
>>
>> 2016-06-29 23:22 GMT+02:00 Michael Allman <michael@videoamp.com>:
>>> I'm sorry I don't have any concrete advice for you, but I hope this helps shed
some light on the current support in Spark for projection pushdown.
>>>
>>> Michael
>>
>> Michael,
>> Thanks for the answer. This resolves one of my questions.
>> Which Spark version you have patched ? 1.6 ? Are you planning to
>> public this patch or just for 2.0 branch ?
>>
>> I gladly help with some benchmark in my environment.
>>
>> Regards,
>> --
>> Maciek Bryński
>



-- 
Maciek Bryński

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message