spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashank Mandil <mandil.shash...@gmail.com>
Subject Re: Local spark context on an executor
Date Wed, 22 Mar 2017 13:07:48 GMT
Sqoop doesn't work on sharded database.

Thanks,
Shashank

On Wed, Mar 22, 2017 at 5:43 AM Reynier González Tejeda <reyniergt@gmail.com>
wrote:

> Why are you using spark instead of sqoop?
>
> 2017-03-21 21:29 GMT-03:00 ayan guha <guha.ayan@gmail.com>:
>
> For JDBC to work, you can start spark-submit with appropriate jdbc driver
> jars (using --jars), then you will have the driver available on executors.
>
> For acquiring connections, create a singleton connection per executor. I
> think dataframe's jdbc reader (sqlContext.read.jdbc) already take care of
> it.
>
> Finally, if you want multiple mysql table to be accesses in a single spark
> job, you can create a list of tables and run a map on that list. Something
> like:
>
> def getTable(tablename:String): Dataframe
> def saveTable(d : Dataframe): Unit
>
> val tables = sc.paralleize(<List of Table>)
> tables.map(getTable).map(saveTable)
>
> On Wed, Mar 22, 2017 at 9:41 AM, Shashank Mandil <
> mandil.shashank@gmail.com> wrote:
>
> I am using spark to dump data from mysql into hdfs.
> The way I am doing this is by creating a spark dataframe with the metadata
> of different mysql tables to dump from multiple mysql hosts and then
> running a map over that data frame to dump each mysql table data into hdfs
> inside the executor.
>
> The reason I want spark context is that I would like to use spark jdbc to
> be able to read the mysql table and then the spark writer to be able to
> write to hdfs.
>
> Thanks,
> Shashank
>
> On Tue, Mar 21, 2017 at 3:37 PM, ayan guha <guha.ayan@gmail.com> wrote:
>
> What is your use case? I am sure there must be a better way to solve it....
>
> On Wed, Mar 22, 2017 at 9:34 AM, Shashank Mandil <
> mandil.shashank@gmail.com> wrote:
>
> Hi All,
>
> I am using spark in a yarn cluster mode.
> When I run a yarn application it creates multiple executors on the hadoop
> datanodes for processing.
>
> Is it possible for me to create a local spark context (master=local) on
> these executors to be able to get a spark context ?
>
> Theoretically since each executor is a java process this should be doable
> isn't it ?
>
> Thanks,
> Shashank
>
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>
>
>
>
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>
>
>
>
>
>

Mime
View raw message