spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Hakobian <nicholas.hakob...@rallyhealth.com>
Subject Re: NLTK with Spark Streaming
Date Tue, 28 Nov 2017 21:45:30 GMT
Depending on your needs, its fairly easy to write a lightweight python
wrapper around the Databricks spark-corenlp library:
https://github.com/databricks/spark-corenlp


Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com


On Sun, Nov 26, 2017 at 8:19 AM, ashish rawat <dceashish@gmail.com> wrote:

> Thanks Holden and Chetan.
>
> Holden - Have you tried it out, do you know the right way to do it?
> Chetan - yes, if we use a Java NLP library, it should not be any issue in
> integrating with spark streaming, but as I pointed out earlier, we want to
> give flexibility to data scientists to use the language and library of
> their choice, instead of restricting them to a library of our choice.
>
> On Sun, Nov 26, 2017 at 9:42 PM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> But you can still use Stanford NLP library and distribute through spark
>> right !
>>
>> On Sun, Nov 26, 2017 at 3:31 PM, Holden Karau <holden@pigscanfly.ca>
>> wrote:
>>
>>> So it’s certainly doable (it’s not super easy mind you), but until the
>>> arrow udf release goes out it will be rather slow.
>>>
>>> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceashish@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
>>>> was wondering if this is a good idea and what are the right Spark operators
>>>> to do this? The reason we want to try this combination is that we don't
>>>> want to run our transformations in python (pyspark), but after the
>>>> transformations, we need to run some natural language processing operations
>>>> and we don't want to restrict the functions data scientists' can use to
>>>> Spark natural language library. So, Spark streaming with NLTK looks like
>>>> the right option, from the perspective of fast data processing and data
>>>> science flexibility.
>>>>
>>>> Regards,
>>>> Ashish
>>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>

Mime
View raw message