spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting
Date Wed, 22 Apr 2020 03:21:56 GMT
No, that's not a thing to apologize for. It's just your call - less context
would bring less reaction and interest.

On Wed, Apr 22, 2020 at 11:50 AM Ruijing Li <liruijing09@gmail.com> wrote:

> I apologize, but I cannot share it, even if it is just typical spark
> libraries. I definitely understand that limits debugging help, but wanted
> to understand if anyone has encountered a similar issue.
>
> On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
> wrote:
>
>> If there's no third party libraries in the dump then why not share the
>> thread dump? (I mean, the output of jstack)
>>
>> stack trace would be more helpful to find which thing acquired lock and
>> which other things are waiting for acquiring lock, if we suspect deadlock.
>>
>> On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <liruijing09@gmail.com> wrote:
>>
>>> After refreshing a couple of times, I notice the lock is being swapped
>>> between these 3. The other 2 will be blocked by whoever gets this lock, in
>>> a cycle of 160 has lock -> 161 -> 159 -> 160
>>>
>>> On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <liruijing09@gmail.com>
>>> wrote:
>>>
>>>> In thread dump, I do see this
>>>> - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE |
>>>> Monitor
>>>> - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED |
>>>> Blocked by Thread(Some(160)) Lock
>>>> -  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED |
>>>> Blocked by Thread(Some(160)) Lock
>>>>
>>>> Could the fact that 160 has the monitor but is not running be causing a
>>>> deadlock preventing the job from finishing?
>>>>
>>>> I do see my Finalizer and main method are waiting. I don’t see any
>>>> other threads from 3rd party libraries or my code in the dump. I do see
>>>> spark context cleaner has timed waiting.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <liruijing09@gmail.com>
>>>> wrote:
>>>>
>>>>> Strangely enough I found an old issue that is the exact same issue as
>>>>> mine
>>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343
>>>>>
>>>>> However I’m using spark 2.4.4 so the issue should have been solved
by
>>>>> now.
>>>>>
>>>>> Like the user in the jira issue I am using mesos, but I am reading
>>>>> from oracle instead of writing to Cassandra and S3.
>>>>>
>>>>>
>>>>> On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezhang@outlook.com>
wrote:
>>>>>
>>>>>> The Thread dump result table of Spark UI can provide some clues to
>>>>>> find out thread locks issue, such as:
>>>>>>
>>>>>>   Thread ID | Thread Name                  | Thread State | Thread
>>>>>> Locks
>>>>>>   13        | NonBlockingInputStreamThread | WAITING      | Blocked
>>>>>> by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951
>>>>>> })
>>>>>>   48        | Thread-16                    | RUNNABLE     |
>>>>>> Monitor(jline.internal.NonBlockingInputStream@103008951})
>>>>>>
>>>>>> And echo thread row can show the call stacks after being clicked,
>>>>>> then you can check the root cause of holding locks like this(Thread
48 of
>>>>>> above):
>>>>>>
>>>>>>   org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native
>>>>>> Method)
>>>>>>
>>>>>> org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
>>>>>>
>>>>>> org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
>>>>>>
>>>>>> org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
>>>>>>   jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
>>>>>>   <snip...>
>>>>>>
>>>>>> Hope it can help you.
>>>>>>
>>>>>> --
>>>>>> Cheers,
>>>>>> -z
>>>>>>
>>>>>> On Thu, 16 Apr 2020 16:36:42 +0900
>>>>>> Jungtaek Lim <kabhwan.opensource@gmail.com> wrote:
>>>>>>
>>>>>> > Do thread dump continuously, per specific period (like 1s) and
see
>>>>>> the
>>>>>> > change of stack / lock for each thread. (This is not easy to
be
>>>>>> done in UI
>>>>>> > so maybe doing manually would be the only option. Not sure Spark
UI
>>>>>> will
>>>>>> > provide the same, haven't used at all.)
>>>>>> >
>>>>>> > It will tell which thread is being blocked (even it's shown
as
>>>>>> running) and
>>>>>> > which point to look at.
>>>>>> >
>>>>>> > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijing09@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > > Once I do. thread dump, what should I be looking for to
tell
>>>>>> where it is
>>>>>> > > hanging? Seeing a lot of timed_waiting and waiting on driver.
>>>>>> Driver is
>>>>>> > > also being blocked by spark UI. If there are no tasks,
is there a
>>>>>> point to
>>>>>> > > do thread dump of executors?
>>>>>> > >
>>>>>> > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <
>>>>>> gabor.g.somogyi@gmail.com>
>>>>>> > > wrote:
>>>>>> > >
>>>>>> > >> The simplest way is to do thread dump which doesn't
require any
>>>>>> fancy
>>>>>> > >> tool (it's available on Spark UI).
>>>>>> > >> Without thread dump it's hard to say anything...
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe
>>>>>> <janethorpe1@aol.com.invalid>
>>>>>> > >> wrote:
>>>>>> > >>
>>>>>> > >>> Here a is another tool I use Logic Analyser  7:55
>>>>>> > >>> https://youtu.be/LnzuMJLZRdU
>>>>>> > >>>
>>>>>> > >>> you could take some suggestions for improving performance
>>>>>> queries.
>>>>>> > >>>
>>>>>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> Jane thorpe
>>>>>> > >>> janethorpe1@aol.com
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> -----Original Message-----
>>>>>> > >>> From: jane thorpe <janethorpe1@aol.com.INVALID>
>>>>>> > >>> To: janethorpe1 <janethorpe1@aol.com>; mich.talebzadeh
<
>>>>>> > >>> mich.talebzadeh@gmail.com>; liruijing09 <liruijing09@gmail.com>;
>>>>>> user <
>>>>>> > >>> user@spark.apache.org>
>>>>>> > >>> Sent: Mon, 13 Apr 2020 8:32
>>>>>> > >>> Subject: Re: Spark hangs while reading from jdbc
- does nothing
>>>>>> Removing
>>>>>> > >>> Guess work from trouble shooting
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> This tool may be useful for you to trouble shoot
your problems
>>>>>> away.
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>>
>>>>>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> "APM tools typically use a waterfall-type view
to show the
>>>>>> blocking
>>>>>> > >>> time of different components cascading through
the control flow
>>>>>> within an
>>>>>> > >>> application.
>>>>>> > >>> These types of visualizations are useful, and AppOptics
has
>>>>>> them, but
>>>>>> > >>> they can be difficult to understand for those of
us without a
>>>>>> PhD."
>>>>>> > >>>
>>>>>> > >>> Especially  helpful if you want to understand through
>>>>>> visualisation and
>>>>>> > >>> you do not have a phD.
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> Jane thorpe
>>>>>> > >>> janethorpe1@aol.com
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> -----Original Message-----
>>>>>> > >>> From: jane thorpe <janethorpe1@aol.com.INVALID>
>>>>>> > >>> To: mich.talebzadeh <mich.talebzadeh@gmail.com>;
liruijing09 <
>>>>>> > >>> liruijing09@gmail.com>; user <user@spark.apache.org>
>>>>>> > >>> CC: user <user@spark.apache.org>
>>>>>> > >>> Sent: Sun, 12 Apr 2020 4:35
>>>>>> > >>> Subject: Re: Spark hangs while reading from jdbc
- does nothing
>>>>>> > >>>
>>>>>> > >>> You seem to be implying the error is intermittent.
>>>>>> > >>> You seem to be implying data is being ingested
 via JDBC. So the
>>>>>> > >>> connection has proven itself to be working unless
no data is
>>>>>> arriving from
>>>>>> > >>> the  JDBC channel at all.  If no data is arriving
then one
>>>>>> could say it
>>>>>> > >>> could be  the JDBC.
>>>>>> > >>> If the error is intermittent  then it is likely
a resource
>>>>>> involved in
>>>>>> > >>> processing is filling to capacity.
>>>>>> > >>> Try reducing the data ingestion volume and see
if that
>>>>>> completes, then
>>>>>> > >>> increase the data ingested  incrementally.
>>>>>> > >>> I assume you have  run the job on small amount
of data so you
>>>>>> have
>>>>>> > >>> completed your prototype stage successfully.
>>>>>> > >>>
>>>>>> > >>> ------------------------------
>>>>>> > >>> On Saturday, 11 April 2020 Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com>
>>>>>> > >>> wrote:
>>>>>> > >>> Hi,
>>>>>> > >>>
>>>>>> > >>> Have you checked your JDBC connections from Spark
to Oracle.
>>>>>> What is
>>>>>> > >>> Oracle saying? Is it doing anything or hanging?
>>>>>> > >>>
>>>>>> > >>> set pagesize 9999
>>>>>> > >>> set linesize 140
>>>>>> > >>> set heading off
>>>>>> > >>> select SUBSTR(name,1,8) || ' sessions as on
>>>>>> '||TO_CHAR(CURRENT_DATE,
>>>>>> > >>> 'MON DD YYYY HH:MI AM') from v$database;
>>>>>> > >>> set heading on
>>>>>> > >>> column spid heading "OS PID" format a6
>>>>>> > >>> column process format a13 heading "Client ProcID"
>>>>>> > >>> column username  format a15
>>>>>> > >>> column sid       format 999
>>>>>> > >>> column serial#   format 99999
>>>>>> > >>> column STATUS    format a3 HEADING 'ACT'
>>>>>> > >>> column last      format 9,999.99
>>>>>> > >>> column TotGets   format 999,999,999,999 HEADING
'Logical I/O'
>>>>>> > >>> column phyRds    format 999,999,999 HEADING 'Physical
I/O'
>>>>>> > >>> column total_memory format 999,999,999 HEADING
'MEM/KB'
>>>>>> > >>> --
>>>>>> > >>> SELECT
>>>>>> > >>>           substr(a.username,1,15) "LOGIN"
>>>>>> > >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5)
AS
>>>>>> > >>> "SID/serial#"
>>>>>> > >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI')
"LOGGED IN SINCE"
>>>>>> > >>>         , substr(a.machine,1,10) HOST
>>>>>> > >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5)
"OS
>>>>>> PID"
>>>>>> > >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5)
>>>>>> "Client PID"
>>>>>> > >>>         , substr(a.program,1,15) PROGRAM
>>>>>> > >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24)
AS
>>>>>> "Logged/Hours"
>>>>>> > >>>         , (
>>>>>> > >>>                 select round(sum(ss.value)/1024)
from v$sesstat
>>>>>> ss,
>>>>>> > >>> v$statname sn
>>>>>> > >>>                 where ss.sid = a.sid and
>>>>>> > >>>                         sn.statistic# = ss.statistic#
and
>>>>>> > >>>                         -- sn.name in ('session
pga memory')
>>>>>> > >>>                         sn.name in ('session pga
>>>>>> memory','session uga
>>>>>> > >>> memory')
>>>>>> > >>>           ) AS total_memory
>>>>>> > >>>         , (b.block_gets + b.consistent_gets) TotGets
>>>>>> > >>>         , b.physical_reads phyRds
>>>>>> > >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE',
'N') STATUS
>>>>>> > >>>         , CASE WHEN a.sid in (select sid from v$mystat
where
>>>>>> rownum = 1)
>>>>>> > >>> THEN '<-- YOU' ELSE ' ' END "INFO"
>>>>>> > >>> FROM
>>>>>> > >>>          v$process p
>>>>>> > >>>         ,v$session a
>>>>>> > >>>         ,v$sess_io b
>>>>>> > >>> WHERE
>>>>>> > >>> a.paddr = p.addr
>>>>>> > >>> AND p.background IS NULL
>>>>>> > >>> --AND  a.sid NOT IN (select sid from v$mystat where
rownum = 1)
>>>>>> > >>> AND a.sid = b.sid
>>>>>> > >>> AND a.username is not null
>>>>>> > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
>>>>>> > >>> --AND CURRENT_DATE - logon_time > 0
>>>>>> > >>> --AND a.sid NOT IN ( select sid from v$mystat where
rownum=1)
>>>>>> --
>>>>>> > >>> exclude me
>>>>>> > >>> --AND (b.block_gets + b.consistent_gets) > 0
>>>>>> > >>> ORDER BY a.username;
>>>>>> > >>> exit
>>>>>> > >>>
>>>>>> > >>> HTH
>>>>>> > >>>
>>>>>> > >>> Dr Mich Talebzadeh
>>>>>> > >>>
>>>>>> > >>> LinkedIn *
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> > >>> <
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> >*
>>>>>> > >>>
>>>>>> > >>> http://talebzadehmich.wordpress.com
>>>>>> > >>>
>>>>>> > >>> *Disclaimer:* Use it at your own risk. Any and
all
>>>>>> responsibility for
>>>>>> > >>> any loss, damage or destruction of data or any
other property
>>>>>> which may
>>>>>> > >>> arise from relying on this email's technical content
is
>>>>>> explicitly
>>>>>> > >>> disclaimed. The author will in no case be liable
for any
>>>>>> monetary damages
>>>>>> > >>> arising from such loss, damage or destruction.
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>>
>>>>>> > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijing09@gmail.com>
>>>>>> wrote:
>>>>>> > >>>
>>>>>> > >>> Hi all,
>>>>>> > >>>
>>>>>> > >>> I am on spark 2.4.4 and using scala 2.11.12, and
running
>>>>>> cluster mode on
>>>>>> > >>> mesos. I am ingesting from an oracle database using
>>>>>> spark.read.jdbc. I am
>>>>>> > >>> seeing a strange issue where spark just hangs and
does nothing,
>>>>>> not
>>>>>> > >>> starting any new tasks. Normally this job finishes
in 30 stages
>>>>>> but
>>>>>> > >>> sometimes it stops at 29 completed stages and doesn’t
start the
>>>>>> last stage.
>>>>>> > >>> The spark job is idling and there is no pending
or active task.
>>>>>> What could
>>>>>> > >>> be the problem? Thanks.
>>>>>> > >>> --
>>>>>> > >>> Cheers,
>>>>>> > >>> Ruijing Li
>>>>>> > >>>
>>>>>> > >>> --
>>>>>> > > Cheers,
>>>>>> > > Ruijing Li
>>>>>> > >
>>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Ruijing Li
>>>>>
>>>> --
>>>> Cheers,
>>>> Ruijing Li
>>>>
>>> --
>>> Cheers,
>>> Ruijing Li
>>>
>> --
> Cheers,
> Ruijing Li
>

Mime
View raw message