spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: text processing in spark (Spark job stucks for several minutes)
Date Thu, 26 Oct 2017 07:15:13 GMT
Please provide source code and exceptions that are in executor and/or driver log.


> On 26. Oct 2017, at 08:42, Donni Khan <prince.donnii@googlemail.com> wrote:
> 
> Hi,
> I'm applying preprocessing methods on big data of text by using spark-Java. I created
my own NLP pipline as a normal java code and call it in the map function like this:
> 
> MyRDD.map(call nlp pipeline fr each row)
> 
> I run my job in a cluster 14 machines(32 Cores  and about 140G for each). The job run
correctltly, it distrbutes the documents across executors, but the job stuck on the last task
for several minutes
> I looked at the job details, I found that most of documents are processed in several
executrs, but only one task stuck on the small number of documents, it looks like the task
waits for something, then after 10-20 minutes the task cntinues to process the rest documents
and finish.
> 
> I also tried to test different configurations but still the same.
> any help?
> 
> thanks,
> Donni
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message