spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Date Mon, 16 Jul 2018 04:32:33 GMT
Hello Jayant,

Thanks for great OSS Contribution :)

On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar <jayantbayarea@gmail.com>
wrote:

> Hello Chetan,
>
> Sorry missed replying earlier. You can find some sample code here :
>
> http://sparkflows.readthedocs.io/en/latest/user-guide/
> python/pipe-python.html
>
> We will continue adding more there.
>
> Feel free to ping me directly in case of questions.
>
> Thanks,
> Jayant
>
>
> On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri <chetan.opensource@gmail.com
> > wrote:
>
>> Hello Jayant,
>>
>> Thank you so much for suggestion. My view was to  use Python function as
>> transformation which can take couple of column names and return object.
>> which you explained. would that possible to point me to similiar codebase
>> example.
>>
>> Thanks.
>>
>> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <jayantbayarea@gmail.com>
>> wrote:
>>
>>> Hello Chetan,
>>>
>>> We have currently done it with .pipe(.py) as Prem suggested.
>>>
>>> That passes the RDD as CSV strings to the python script. The python
>>> script can either process it line by line, create the result and return it
>>> back. Or create things like Pandas Dataframe for processing and finally
>>> write the results back.
>>>
>>> In the Spark/Scala/Java code, you get an RDD of string, which we convert
>>> back to a Dataframe.
>>>
>>> Feel free to ping me directly in case of questions.
>>>
>>> Thanks,
>>> Jayant
>>>
>>>
>>> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <
>>> chetan.opensource@gmail.com> wrote:
>>>
>>>> Prem sure, Thanks for suggestion.
>>>>
>>>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <sparksure542@gmail.com>
>>>> wrote:
>>>>
>>>>> try .pipe(.py) on RDD
>>>>>
>>>>> Thanks,
>>>>> Prem
>>>>>
>>>>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>>>>> chetan.opensource@gmail.com> wrote:
>>>>>
>>>>>> Can someone please suggest me , thanks
>>>>>>
>>>>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <
>>>>>> chetan.opensource@gmail.com> wrote:
>>>>>>
>>>>>>> Hello Dear Spark User / Dev,
>>>>>>>
>>>>>>> I would like to pass Python user defined function to Spark Job
>>>>>>> developed using Scala and return value of that function would
be returned
>>>>>>> to DF / Dataset API.
>>>>>>>
>>>>>>> Can someone please guide me, which would be best approach to
do
>>>>>>> this. Python function would be mostly transformation function.
Also would
>>>>>>> like to pass Java Function as a String to Spark / Scala job and
it applies
>>>>>>> to RDD / Data Frame and should return RDD / Data Frame.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message