spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvam Raman <sel...@gmail.com>
Subject pyspark+spacy throwing pickling exception
Date Thu, 15 Feb 2018 12:08:24 GMT
import spacy

nlp = spacy.load('en')



def getPhrases(content):
    phrases = []
    doc = nlp(str(content))
    for chunks in doc.noun_chunks:
        phrases.append(chunks.text)
    return phrases

the above function will retrieve the noun phrases from the content and
return list of phrases.


def f(x) : print(x)


description = xmlData.filter(col("dcterms:description").isNotNull()).select(col("dcterms:description").alias("desc"))

description.rdd.flatMap(lambda row: getPhrases(row.desc)).foreach(f)

when i am trying to access getphrases i am getting below exception



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Mime
View raw message