spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruijing Li <liruijin...@gmail.com>
Subject Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting
Date Tue, 21 Apr 2020 17:38:29 GMT
After refreshing a couple of times, I notice the lock is being swapped
between these 3. The other 2 will be blocked by whoever gets this lock, in
a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <liruijing09@gmail.com> wrote:

> In thread dump, I do see this
> - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE |
> Monitor
> - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED |
> Blocked by Thread(Some(160)) Lock
> -  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED |
> Blocked by Thread(Some(160)) Lock
>
> Could the fact that 160 has the monitor but is not running be causing a
> deadlock preventing the job from finishing?
>
> I do see my Finalizer and main method are waiting. I don’t see any other
> threads from 3rd party libraries or my code in the dump. I do see spark
> context cleaner has timed waiting.
>
> Thanks
>
>
> On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <liruijing09@gmail.com> wrote:
>
>> Strangely enough I found an old issue that is the exact same issue as
>> mine
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343
>>
>> However I’m using spark 2.4.4 so the issue should have been solved by now.
>>
>> Like the user in the jira issue I am using mesos, but I am reading from
>> oracle instead of writing to Cassandra and S3.
>>
>>
>> On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezhang@outlook.com> wrote:
>>
>>> The Thread dump result table of Spark UI can provide some clues to find
>>> out thread locks issue, such as:
>>>
>>>   Thread ID | Thread Name                  | Thread State | Thread Locks
>>>   13        | NonBlockingInputStreamThread | WAITING      | Blocked by
>>> Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
>>>   48        | Thread-16                    | RUNNABLE     |
>>> Monitor(jline.internal.NonBlockingInputStream@103008951})
>>>
>>> And echo thread row can show the call stacks after being clicked, then
>>> you can check the root cause of holding locks like this(Thread 48 of above):
>>>
>>>   org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
>>>
>>> org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
>>>
>>> org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
>>>
>>> org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
>>>   jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
>>>   <snip...>
>>>
>>> Hope it can help you.
>>>
>>> --
>>> Cheers,
>>> -z
>>>
>>> On Thu, 16 Apr 2020 16:36:42 +0900
>>> Jungtaek Lim <kabhwan.opensource@gmail.com> wrote:
>>>
>>> > Do thread dump continuously, per specific period (like 1s) and see the
>>> > change of stack / lock for each thread. (This is not easy to be done
>>> in UI
>>> > so maybe doing manually would be the only option. Not sure Spark UI
>>> will
>>> > provide the same, haven't used at all.)
>>> >
>>> > It will tell which thread is being blocked (even it's shown as
>>> running) and
>>> > which point to look at.
>>> >
>>> > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijing09@gmail.com>
>>> wrote:
>>> >
>>> > > Once I do. thread dump, what should I be looking for to tell where
>>> it is
>>> > > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver
>>> is
>>> > > also being blocked by spark UI. If there are no tasks, is there a
>>> point to
>>> > > do thread dump of executors?
>>> > >
>>> > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <
>>> gabor.g.somogyi@gmail.com>
>>> > > wrote:
>>> > >
>>> > >> The simplest way is to do thread dump which doesn't require any
>>> fancy
>>> > >> tool (it's available on Spark UI).
>>> > >> Without thread dump it's hard to say anything...
>>> > >>
>>> > >>
>>> > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe
>>> <janethorpe1@aol.com.invalid>
>>> > >> wrote:
>>> > >>
>>> > >>> Here a is another tool I use Logic Analyser  7:55
>>> > >>> https://youtu.be/LnzuMJLZRdU
>>> > >>>
>>> > >>> you could take some suggestions for improving performance 
queries.
>>> > >>>
>>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
>>> > >>>
>>> > >>>
>>> > >>> Jane thorpe
>>> > >>> janethorpe1@aol.com
>>> > >>>
>>> > >>>
>>> > >>> -----Original Message-----
>>> > >>> From: jane thorpe <janethorpe1@aol.com.INVALID>
>>> > >>> To: janethorpe1 <janethorpe1@aol.com>; mich.talebzadeh
<
>>> > >>> mich.talebzadeh@gmail.com>; liruijing09 <liruijing09@gmail.com>;
>>> user <
>>> > >>> user@spark.apache.org>
>>> > >>> Sent: Mon, 13 Apr 2020 8:32
>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
>>> Removing
>>> > >>> Guess work from trouble shooting
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> This tool may be useful for you to trouble shoot your problems
>>> away.
>>> > >>>
>>> > >>>
>>> > >>>
>>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
>>> > >>>
>>> > >>>
>>> > >>> "APM tools typically use a waterfall-type view to show the
blocking
>>> > >>> time of different components cascading through the control
flow
>>> within an
>>> > >>> application.
>>> > >>> These types of visualizations are useful, and AppOptics has
them,
>>> but
>>> > >>> they can be difficult to understand for those of us without
a PhD."
>>> > >>>
>>> > >>> Especially  helpful if you want to understand through
>>> visualisation and
>>> > >>> you do not have a phD.
>>> > >>>
>>> > >>>
>>> > >>> Jane thorpe
>>> > >>> janethorpe1@aol.com
>>> > >>>
>>> > >>>
>>> > >>> -----Original Message-----
>>> > >>> From: jane thorpe <janethorpe1@aol.com.INVALID>
>>> > >>> To: mich.talebzadeh <mich.talebzadeh@gmail.com>; liruijing09
<
>>> > >>> liruijing09@gmail.com>; user <user@spark.apache.org>
>>> > >>> CC: user <user@spark.apache.org>
>>> > >>> Sent: Sun, 12 Apr 2020 4:35
>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
>>> > >>>
>>> > >>> You seem to be implying the error is intermittent.
>>> > >>> You seem to be implying data is being ingested  via JDBC. So
the
>>> > >>> connection has proven itself to be working unless no data is
>>> arriving from
>>> > >>> the  JDBC channel at all.  If no data is arriving then one
could
>>> say it
>>> > >>> could be  the JDBC.
>>> > >>> If the error is intermittent  then it is likely a resource
>>> involved in
>>> > >>> processing is filling to capacity.
>>> > >>> Try reducing the data ingestion volume and see if that completes,
>>> then
>>> > >>> increase the data ingested  incrementally.
>>> > >>> I assume you have  run the job on small amount of data so you
have
>>> > >>> completed your prototype stage successfully.
>>> > >>>
>>> > >>> ------------------------------
>>> > >>> On Saturday, 11 April 2020 Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com>
>>> > >>> wrote:
>>> > >>> Hi,
>>> > >>>
>>> > >>> Have you checked your JDBC connections from Spark to Oracle.
What
>>> is
>>> > >>> Oracle saying? Is it doing anything or hanging?
>>> > >>>
>>> > >>> set pagesize 9999
>>> > >>> set linesize 140
>>> > >>> set heading off
>>> > >>> select SUBSTR(name,1,8) || ' sessions as on
>>> '||TO_CHAR(CURRENT_DATE,
>>> > >>> 'MON DD YYYY HH:MI AM') from v$database;
>>> > >>> set heading on
>>> > >>> column spid heading "OS PID" format a6
>>> > >>> column process format a13 heading "Client ProcID"
>>> > >>> column username  format a15
>>> > >>> column sid       format 999
>>> > >>> column serial#   format 99999
>>> > >>> column STATUS    format a3 HEADING 'ACT'
>>> > >>> column last      format 9,999.99
>>> > >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
>>> > >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
>>> > >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
>>> > >>> --
>>> > >>> SELECT
>>> > >>>           substr(a.username,1,15) "LOGIN"
>>> > >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
>>> > >>> "SID/serial#"
>>> > >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
>>> > >>>         , substr(a.machine,1,10) HOST
>>> > >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS
PID"
>>> > >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5)
"Client
>>> PID"
>>> > >>>         , substr(a.program,1,15) PROGRAM
>>> > >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
>>> > >>>         , (
>>> > >>>                 select round(sum(ss.value)/1024) from v$sesstat
ss,
>>> > >>> v$statname sn
>>> > >>>                 where ss.sid = a.sid and
>>> > >>>                         sn.statistic# = ss.statistic# and
>>> > >>>                         -- sn.name in ('session pga memory')
>>> > >>>                         sn.name in ('session pga memory','session
>>> uga
>>> > >>> memory')
>>> > >>>           ) AS total_memory
>>> > >>>         , (b.block_gets + b.consistent_gets) TotGets
>>> > >>>         , b.physical_reads phyRds
>>> > >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
>>> > >>>         , CASE WHEN a.sid in (select sid from v$mystat where
>>> rownum = 1)
>>> > >>> THEN '<-- YOU' ELSE ' ' END "INFO"
>>> > >>> FROM
>>> > >>>          v$process p
>>> > >>>         ,v$session a
>>> > >>>         ,v$sess_io b
>>> > >>> WHERE
>>> > >>> a.paddr = p.addr
>>> > >>> AND p.background IS NULL
>>> > >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum
= 1)
>>> > >>> AND a.sid = b.sid
>>> > >>> AND a.username is not null
>>> > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
>>> > >>> --AND CURRENT_DATE - logon_time > 0
>>> > >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)
 --
>>> > >>> exclude me
>>> > >>> --AND (b.block_gets + b.consistent_gets) > 0
>>> > >>> ORDER BY a.username;
>>> > >>> exit
>>> > >>>
>>> > >>> HTH
>>> > >>>
>>> > >>> Dr Mich Talebzadeh
>>> > >>>
>>> > >>> LinkedIn *
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> > >>> <
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> >*
>>> > >>>
>>> > >>> http://talebzadehmich.wordpress.com
>>> > >>>
>>> > >>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>> for
>>> > >>> any loss, damage or destruction of data or any other property
>>> which may
>>> > >>> arise from relying on this email's technical content is explicitly
>>> > >>> disclaimed. The author will in no case be liable for any monetary
>>> damages
>>> > >>> arising from such loss, damage or destruction.
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijing09@gmail.com>
>>> wrote:
>>> > >>>
>>> > >>> Hi all,
>>> > >>>
>>> > >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster
>>> mode on
>>> > >>> mesos. I am ingesting from an oracle database using
>>> spark.read.jdbc. I am
>>> > >>> seeing a strange issue where spark just hangs and does nothing,
not
>>> > >>> starting any new tasks. Normally this job finishes in 30 stages
but
>>> > >>> sometimes it stops at 29 completed stages and doesn’t start
the
>>> last stage.
>>> > >>> The spark job is idling and there is no pending or active task.
>>> What could
>>> > >>> be the problem? Thanks.
>>> > >>> --
>>> > >>> Cheers,
>>> > >>> Ruijing Li
>>> > >>>
>>> > >>> --
>>> > > Cheers,
>>> > > Ruijing Li
>>> > >
>>>
>> --
>> Cheers,
>> Ruijing Li
>>
> --
> Cheers,
> Ruijing Li
>
-- 
Cheers,
Ruijing Li

Mime
View raw message