spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: pyspark+spacy throwing pickling exception
Date Thu, 15 Feb 2018 12:14:17 GMT
So you left out the exception. On one hand I’m also not sure how well spacy
serializes, so to debug this I would start off by moving the nlp = inside
of my function and see if it still fails.

On Thu, Feb 15, 2018 at 9:08 PM Selvam Raman <selmna@gmail.com> wrote:

> import spacy
>
> nlp = spacy.load('en')
>
>
>
> def getPhrases(content):
>     phrases = []
>     doc = nlp(str(content))
>     for chunks in doc.noun_chunks:
>         phrases.append(chunks.text)
>     return phrases
>
> the above function will retrieve the noun phrases from the content and
> return list of phrases.
>
>
> def f(x) : print(x)
>
>
> description = xmlData.filter(col("dcterms:description").isNotNull()).select(col("dcterms:description").alias("desc"))
>
> description.rdd.flatMap(lambda row: getPhrases(row.desc)).foreach(f)
>
> when i am trying to access getphrases i am getting below exception
>
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
-- 
Twitter: https://twitter.com/holdenkarau

Mime
View raw message