spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Load Table as DataFrame
Date Wed, 18 May 2016 05:45:41 GMT
Do you have the full source code? Why do you convert a data frame to rdd - this does not make
sense to me?

> On 18 May 2016, at 06:13, Mohanraj Ragupathiraj <mohanaugust@gmail.com> wrote:
> 
> I have created a DataFrame from a HBase Table (PHOENIX) which has 500 million rows. From
the DataFrame I created an RDD of JavaBean and use it for joining with data from a file.
> 
> 	Map<String, String> phoenixInfoMap = new HashMap<String, String>();
> 	phoenixInfoMap.put("table", tableName);
> 	phoenixInfoMap.put("zkUrl", zkURL);
> 	DataFrame df = sqlContext.read().format("org.apache.phoenix.spark").options(phoenixInfoMap).load();
> 	JavaRDD<Row> tableRows = df.toJavaRDD();
> 	JavaPairRDD<String, AccountModel> dbData = tableRows.mapToPair(
> 	new PairFunction<Row, String, String>()
> 	{
> 		@Override
> 		public Tuple2<String, String> call(Row row) throws Exception
> 		{
> 			return new Tuple2<String, String>(row.getAs("ID"), row.getAs("NAME"));
> 		}
> 	});
>  
> Now my question - Lets say the file has 2 unique million entries matching with the table.
Is the entire table loaded into memory as RDD or only the matching 2 million records from
the table will be loaded into memory as RDD ?
> 
> 
> http://stackoverflow.com/questions/37289849/phoenix-spark-load-table-as-dataframe
> 
> -- 
> Thanks and Regards
> Mohan

Mime
View raw message