spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvam Raman <sel...@gmail.com>
Subject Re: pyspark+spacy throwing pickling exception
Date Thu, 15 Feb 2018 14:34:59 GMT
Hi ,

i solved the issue when i extract the method into another class.

Failure:
Class extract.py - contains the whole implementation.
Because of this single class driver trying to serialize spacy(english)
object and sending to executor. There i am facing pickling exception.

Success:
Class extract.py - it referring getPhrase method of spacyutils
Class spacytuils.py

Now, spacy initialized in executor, there is no need of serialization.

Please let me know my understanding is correct.


On Thu, Feb 15, 2018 at 12:14 PM, Holden Karau <holden@pigscanfly.ca> wrote:

> So you left out the exception. On one hand I’m also not sure how well
> spacy serializes, so to debug this I would start off by moving the nlp =
> inside of my function and see if it still fails.
>
> On Thu, Feb 15, 2018 at 9:08 PM Selvam Raman <selmna@gmail.com> wrote:
>
>> import spacy
>>
>> nlp = spacy.load('en')
>>
>>
>>
>> def getPhrases(content):
>>     phrases = []
>>     doc = nlp(str(content))
>>     for chunks in doc.noun_chunks:
>>         phrases.append(chunks.text)
>>     return phrases
>>
>> the above function will retrieve the noun phrases from the content and
>> return list of phrases.
>>
>>
>> def f(x) : print(x)
>>
>>
>> description = xmlData.filter(col("dcterms:description").isNotNull()).select(col("dcterms:description").alias("desc"))
>>
>> description.rdd.flatMap(lambda row: getPhrases(row.desc)).foreach(f)
>>
>> when i am trying to access getphrases i am getting below exception
>>
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
> --
> Twitter: https://twitter.com/holdenkarau
>



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Mime
View raw message