spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: unsure how to create 2 outputs from spark-sql udf expression
Date Fri, 27 May 2016 03:02:39 GMT
Couldn't you include all the needed columns in your input dataframe?

// maropu

On Fri, May 27, 2016 at 1:46 AM, Koert Kuipers <koert@tresata.com> wrote:

> that is nice and compact, but it does not add the columns to an existing
> dataframe
>
> On Wed, May 25, 2016 at 11:39 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
> wrote:
>
>> Hi,
>>
>> How about this?
>> --
>> val func = udf((i: Int) => Tuple2(i, i))
>> val df = Seq((1, 0), (2, 5)).toDF("a", "b")
>> df.select(func($"a").as("r")).select($"r._1", $"r._2")
>>
>> // maropu
>>
>>
>> On Thu, May 26, 2016 at 5:11 AM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> hello all,
>>>
>>> i have a single udf that creates 2 outputs (so a tuple 2). i would like
>>> to add these 2 columns to my dataframe.
>>>
>>> my current solution is along these lines:
>>> df
>>>   .withColumn("_temp_", udf(inputColumns))
>>>   .withColumn("x", col("_temp_)("_1"))
>>>   .withColumn("y", col("_temp_")("_2"))
>>>   .drop("_temp_")
>>>
>>> this works, but its not pretty with the temporary field stuff.
>>>
>>> i also tried this:
>>> val tmp = udf(inputColumns)
>>> df
>>>   .withColumn("x", tmp("_1"))
>>>   .withColumn("y", tmp("_2"))
>>>
>>> this also works, but unfortunately the udf is evaluated twice
>>>
>>> is there a better way to do this?
>>>
>>> thanks! koert
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message