spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <>
Subject Re: Adding a column to a SchemaRDD
Date Mon, 15 Dec 2014 01:59:06 GMT

On Fri, Dec 12, 2014 at 3:11 PM, Nathan Kronenfeld <> wrote:
> I can see how to do it if can express the added values in SQL - just run
> "SELECT *,valueCalculation AS newColumnName FROM table"
> I've been searching all over for how to do this if my added value is a
> scala function, with no luck.
> Let's say I have a SchemaRDD with columns A, B, and C, and I want to add a
> new column, D, calculated using Utility.process(b, c), and I want (of
> course) to pass in the value B and C from each row, ending up with a new
> SchemaRDD with columns A, B, C, and D.
> <>

I guess you would have to do two things:
- => { extend the row here })
  which will give you a plain RDD[Row] without a schema
- take the schema from the schemardd and extend it manually by the name and
type of the newly added column,
- create a new SchemaRDD from your mapped RDD and the manually extended

Does that make sense?


View raw message