spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: unsure how to create 2 outputs from spark-sql udf expression
Date Fri, 27 May 2016 03:15:42 GMT
yes, but i also need all the columns (plus of course the 2 new ones) in my
output. your select operation drops all the input columns.
best, koert

On Thu, May 26, 2016 at 11:02 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> Couldn't you include all the needed columns in your input dataframe?
>
> // maropu
>
> On Fri, May 27, 2016 at 1:46 AM, Koert Kuipers <koert@tresata.com> wrote:
>
>> that is nice and compact, but it does not add the columns to an existing
>> dataframe
>>
>> On Wed, May 25, 2016 at 11:39 PM, Takeshi Yamamuro <linguin.m.s@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> How about this?
>>> --
>>> val func = udf((i: Int) => Tuple2(i, i))
>>> val df = Seq((1, 0), (2, 5)).toDF("a", "b")
>>> df.select(func($"a").as("r")).select($"r._1", $"r._2")
>>>
>>> // maropu
>>>
>>>
>>> On Thu, May 26, 2016 at 5:11 AM, Koert Kuipers <koert@tresata.com>
>>> wrote:
>>>
>>>> hello all,
>>>>
>>>> i have a single udf that creates 2 outputs (so a tuple 2). i would like
>>>> to add these 2 columns to my dataframe.
>>>>
>>>> my current solution is along these lines:
>>>> df
>>>>   .withColumn("_temp_", udf(inputColumns))
>>>>   .withColumn("x", col("_temp_)("_1"))
>>>>   .withColumn("y", col("_temp_")("_2"))
>>>>   .drop("_temp_")
>>>>
>>>> this works, but its not pretty with the temporary field stuff.
>>>>
>>>> i also tried this:
>>>> val tmp = udf(inputColumns)
>>>> df
>>>>   .withColumn("x", tmp("_1"))
>>>>   .withColumn("y", tmp("_2"))
>>>>
>>>> this also works, but unfortunately the udf is evaluated twice
>>>>
>>>> is there a better way to do this?
>>>>
>>>> thanks! koert
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message