spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <>
Subject [jira] [Commented] (SPARK-26019) pyspark/ "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
Date Tue, 20 Nov 2018 19:04:00 GMT


Imran Rashid commented on SPARK-26019:

Yeah I agree with [~viirya]'s analysis, my suggestion was from just a quick glance at the
code.  I don't think swapping those lines is likely to help at all ... but I can't come up
with any other explanation for how it does happen.  From SPARK-26113, it doesn't seem particular
to the cloudera distribution, but we'll poke at it a bit.  SPARK-26113 also makes it sound
like a race as it works after the initial failure ...
[~Tagar] are you running a pyspark shell, or with spark-submit?  the token generation is different
in those two cases, so that might matter (though I don't see how yet ...)

[~hyukjin.kwon] for errors which appear to be from a race, I don't think we should close immediately
because we can't reproduce it, as it can be tricky to reproduce and involve something about
the user environment that we dont' immediately understand, that doesn't mean its not a real
issue.  (I absolutely agree that if it appears to be related to a specific distribution, it
doesn't belong as an issue here).

> pyspark/ "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
> ----------------------------------------------------------------------------------------------------------------
>                 Key: SPARK-26019
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.2, 2.4.0
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
> Started happening after 2.3.1 -> 2.3.2 upgrade.
> {code:python}
> Exception happened during processing of request from ('', 43418)
> ----------------------------------------
> Traceback (most recent call last):
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/", line 290, in
>     self.process_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/", line 318, in
>     self.finish_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/", line 331, in
>     self.RequestHandlerClass(request, client_address, self)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/", line 652, in
>     self.handle()
>   File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/",
line 263, in handle
>     poll(authenticate_and_accum_updates)
>   File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/",
line 238, in poll
>     if func():
>   File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/",
line 251, in authenticate_and_accum_updates
>     received_token =
> TypeError: object of type 'NoneType' has no len()
> {code}
> Error happens here:
> The PySpark code was just running a simple pipeline of 
> binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. )
> and then converting it to a dataframe and running a count on it.
> It seems error is flaky - on next rerun it didn't happen.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message