spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Date Mon, 16 Jul 2018 13:39:44 GMT
Hi,

I am not very sure if SPARK data frames apply to your used case, if it does
please give a try by creating a UDF in Python and check whether you can
call it in Scala or not using select and expr.

Regards,
Gourav Sengupta

On Mon, Jul 16, 2018 at 5:32 AM, Chetan Khatri <chetan.opensource@gmail.com>
wrote:

> Hello Jayant,
>
> Thanks for great OSS Contribution :)
>
> On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar <jayantbayarea@gmail.com>
> wrote:
>
>> Hello Chetan,
>>
>> Sorry missed replying earlier. You can find some sample code here :
>>
>> http://sparkflows.readthedocs.io/en/latest/user-guide/python
>> /pipe-python.html
>>
>> We will continue adding more there.
>>
>> Feel free to ping me directly in case of questions.
>>
>> Thanks,
>> Jayant
>>
>>
>> On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hello Jayant,
>>>
>>> Thank you so much for suggestion. My view was to  use Python function as
>>> transformation which can take couple of column names and return object.
>>> which you explained. would that possible to point me to similiar codebase
>>> example.
>>>
>>> Thanks.
>>>
>>> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <jayantbayarea@gmail.com>
>>> wrote:
>>>
>>>> Hello Chetan,
>>>>
>>>> We have currently done it with .pipe(.py) as Prem suggested.
>>>>
>>>> That passes the RDD as CSV strings to the python script. The python
>>>> script can either process it line by line, create the result and return it
>>>> back. Or create things like Pandas Dataframe for processing and finally
>>>> write the results back.
>>>>
>>>> In the Spark/Scala/Java code, you get an RDD of string, which we
>>>> convert back to a Dataframe.
>>>>
>>>> Feel free to ping me directly in case of questions.
>>>>
>>>> Thanks,
>>>> Jayant
>>>>
>>>>
>>>> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <
>>>> chetan.opensource@gmail.com> wrote:
>>>>
>>>>> Prem sure, Thanks for suggestion.
>>>>>
>>>>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <sparksure542@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> try .pipe(.py) on RDD
>>>>>>
>>>>>> Thanks,
>>>>>> Prem
>>>>>>
>>>>>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>>>>>> chetan.opensource@gmail.com> wrote:
>>>>>>
>>>>>>> Can someone please suggest me , thanks
>>>>>>>
>>>>>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <
>>>>>>> chetan.opensource@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Dear Spark User / Dev,
>>>>>>>>
>>>>>>>> I would like to pass Python user defined function to Spark
Job
>>>>>>>> developed using Scala and return value of that function would
be returned
>>>>>>>> to DF / Dataset API.
>>>>>>>>
>>>>>>>> Can someone please guide me, which would be best approach
to do
>>>>>>>> this. Python function would be mostly transformation function.
Also would
>>>>>>>> like to pass Java Function as a String to Spark / Scala job
and it applies
>>>>>>>> to RDD / Data Frame and should return RDD / Data Frame.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message