spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feynman Liang <fli...@databricks.com>
Subject Re: custom RDD in java
Date Wed, 01 Jul 2015 19:29:13 GMT
On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora <shushantarora09@gmail.com>
 wrote:

> JavaRDD<String> rdd = javasparkcontext.parllelise(tables);


You are already creating an RDD in Java here ;)

However, it's not clear to me why you'd want to make this an RDD. Is the
list of tables so large that it doesn't fit on a single machine? If not,
you may be better off spinning up one spark job for dumping each table in
tables using a JDBC datasource
<https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases>
.

On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito <
silvio.fiorito@granturing.com> wrote:

>   Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala
> absolutely.
>
>   From: Shushant Arora
> Date: Wednesday, July 1, 2015 at 1:44 PM
> To: Silvio Fiorito
> Cc: user
> Subject: Re: custom RDD in java
>
>   ok..will evaluate these options but is it possible to create RDD in
> java?
>
>
> On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito <
> silvio.fiorito@granturing.com> wrote:
>
>>  If all you’re doing is just dumping tables from SQLServer to HDFS, have
>> you looked at Sqoop?
>>
>>  Otherwise, if you need to run this in Spark could you just use the
>> existing JdbcRDD?
>>
>>
>>   From: Shushant Arora
>> Date: Wednesday, July 1, 2015 at 10:19 AM
>> To: user
>> Subject: custom RDD in java
>>
>>   Hi
>>
>>  Is it possible to write custom RDD in java?
>>
>>  Requirement is - I am having a list of Sqlserver tables  need to be
>> dumped in HDFS.
>>
>>  So I have a
>> List<String> tables = {dbname.tablename,dbname.tablename2......};
>>
>>  then
>> JavaRDD<String> rdd = javasparkcontext.parllelise(tables);
>>
>>  JavaRDDString> tablecontent = rdd.map(new
>> Function<String,Iterable<String>>){fetch table and return populate iterable}
>>
>>  tablecontent.storeAsTextFile("hffs path");
>>
>>
>>  In rdd.map(new Function<String,>). I cannot keep complete table content
>> in memory , so I want to creat my own RDD to handle it.
>>
>>  Thanks
>> Shushant
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message