spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: unsure how to create 2 outputs from spark-sql udf expression
Date Fri, 27 May 2016 03:43:37 GMT
yeah that could work, since i should know (or be able to find out) all the
input columns

On Thu, May 26, 2016 at 11:30 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> You couldn't do like this?
>
> --
> val func = udf((i: Int) => Tuple2(i, i))
> val df = Seq((1, ..., 0), (2, ..., 5)).toDF("input", "c0", "c1", ....other
> needed columns...., "cX")
> df.select(func($"a").as("r"), $"c0", $"c1", ....$"cX").select($"r._1",
> $"r._2", $"c0", $"c1", ....$"cX")
>
> // maropu
>
>
> On Fri, May 27, 2016 at 12:15 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> yes, but i also need all the columns (plus of course the 2 new ones) in
>> my output. your select operation drops all the input columns.
>> best, koert
>>
>> On Thu, May 26, 2016 at 11:02 PM, Takeshi Yamamuro <linguin.m.s@gmail.com
>> > wrote:
>>
>>> Couldn't you include all the needed columns in your input dataframe?
>>>
>>> // maropu
>>>
>>> On Fri, May 27, 2016 at 1:46 AM, Koert Kuipers <koert@tresata.com>
>>> wrote:
>>>
>>>> that is nice and compact, but it does not add the columns to an
>>>> existing dataframe
>>>>
>>>> On Wed, May 25, 2016 at 11:39 PM, Takeshi Yamamuro <
>>>> linguin.m.s@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> How about this?
>>>>> --
>>>>> val func = udf((i: Int) => Tuple2(i, i))
>>>>> val df = Seq((1, 0), (2, 5)).toDF("a", "b")
>>>>> df.select(func($"a").as("r")).select($"r._1", $"r._2")
>>>>>
>>>>> // maropu
>>>>>
>>>>>
>>>>> On Thu, May 26, 2016 at 5:11 AM, Koert Kuipers <koert@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> hello all,
>>>>>>
>>>>>> i have a single udf that creates 2 outputs (so a tuple 2). i would
>>>>>> like to add these 2 columns to my dataframe.
>>>>>>
>>>>>> my current solution is along these lines:
>>>>>> df
>>>>>>   .withColumn("_temp_", udf(inputColumns))
>>>>>>   .withColumn("x", col("_temp_)("_1"))
>>>>>>   .withColumn("y", col("_temp_")("_2"))
>>>>>>   .drop("_temp_")
>>>>>>
>>>>>> this works, but its not pretty with the temporary field stuff.
>>>>>>
>>>>>> i also tried this:
>>>>>> val tmp = udf(inputColumns)
>>>>>> df
>>>>>>   .withColumn("x", tmp("_1"))
>>>>>>   .withColumn("y", tmp("_2"))
>>>>>>
>>>>>> this also works, but unfortunately the udf is evaluated twice
>>>>>>
>>>>>> is there a better way to do this?
>>>>>>
>>>>>> thanks! koert
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message