spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giuseppe Celano <cel...@informatik.uni-leipzig.de>
Subject Re: Applying a Java script to many files: Java API or also Python API?
Date Thu, 28 Sep 2017 09:36:14 GMT
Hi,

What I meant is that I could run the Java script using the subprocess module in Python. In
that case is any difference (from directly coding in the Java API)  in performance expected?
Thanks.



> On Sep 28, 2017, at 3:32 AM, Weichen Xu <weichen.xu@databricks.com> wrote:
> 
> I think you have to use Spark Java API, in PySpark, functions running on spark executors
(such as map function) can only written in python.
> 
> On Thu, Sep 28, 2017 at 12:48 AM, Giuseppe Celano <celano@informatik.uni-leipzig.de
<mailto:celano@informatik.uni-leipzig.de>> wrote:
> Hi everyone,
> 
> I would like to apply a java script to many files in parallel. I am wondering whether
I should definitely use the Spark Java API, or I could also run the script using the Python
API (with which I am more familiar with), without this affecting performance. Thanks.
> 
> Giuseppe
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> 
> 


Mime
View raw message