spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Starks <suse...@protonmail.com.INVALID>
Subject Spark job's driver programe consums too much memory
Date Fri, 07 Sep 2018 14:04:04 GMT
I have a Spark job that read data from database. By increasing submit parameter '--driver-memory
25g' the job can works without a problem locally but not in prod env because prod master do
not have enough capacity.

So I have a few questions:

-  What functions such as collecct() would cause the data to be sent back to the driver program?
  My job so far merely uses `as`, `filter`, `map`, and `filter`.

- Is it possible to write data (in parquet format for instance) to hdfs directly from the
executor? If so how can I do (any code snippet, doc for reference, or what keyword to search
cause can't find by e.g. `spark direct executor hdfs write`)?

Thanks
Mime
View raw message