spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting
Date Wed, 22 Apr 2020 02:12:04 GMT
If there's no third party libraries in the dump then why not share the
thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and
which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <liruijing09@gmail.com> wrote:

> After refreshing a couple of times, I notice the lock is being swapped
> between these 3. The other 2 will be blocked by whoever gets this lock, in
> a cycle of 160 has lock -> 161 -> 159 -> 160
>
> On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <liruijing09@gmail.com> wrote:
>
>> In thread dump, I do see this
>> - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE |
>> Monitor
>> - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED |
>> Blocked by Thread(Some(160)) Lock
>> -  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED |
>> Blocked by Thread(Some(160)) Lock
>>
>> Could the fact that 160 has the monitor but is not running be causing a
>> deadlock preventing the job from finishing?
>>
>> I do see my Finalizer and main method are waiting. I don’t see any other
>> threads from 3rd party libraries or my code in the dump. I do see spark
>> context cleaner has timed waiting.
>>
>> Thanks
>>
>>
>> On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <liruijing09@gmail.com> wrote:
>>
>>> Strangely enough I found an old issue that is the exact same issue as
>>> mine
>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343
>>>
>>> However I’m using spark 2.4.4 so the issue should have been solved by
>>> now.
>>>
>>> Like the user in the jira issue I am using mesos, but I am reading from
>>> oracle instead of writing to Cassandra and S3.
>>>
>>>
>>> On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezhang@outlook.com> wrote:
>>>
>>>> The Thread dump result table of Spark UI can provide some clues to find
>>>> out thread locks issue, such as:
>>>>
>>>>   Thread ID | Thread Name                  | Thread State | Thread Locks
>>>>   13        | NonBlockingInputStreamThread | WAITING      | Blocked by
>>>> Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
>>>>   48        | Thread-16                    | RUNNABLE     |
>>>> Monitor(jline.internal.NonBlockingInputStream@103008951})
>>>>
>>>> And echo thread row can show the call stacks after being clicked, then
>>>> you can check the root cause of holding locks like this(Thread 48 of above):
>>>>
>>>>   org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native
>>>> Method)
>>>>
>>>> org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
>>>>
>>>> org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
>>>>
>>>> org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
>>>>   jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
>>>>   <snip...>
>>>>
>>>> Hope it can help you.
>>>>
>>>> --
>>>> Cheers,
>>>> -z
>>>>
>>>> On Thu, 16 Apr 2020 16:36:42 +0900
>>>> Jungtaek Lim <kabhwan.opensource@gmail.com> wrote:
>>>>
>>>> > Do thread dump continuously, per specific period (like 1s) and see the
>>>> > change of stack / lock for each thread. (This is not easy to be done
>>>> in UI
>>>> > so maybe doing manually would be the only option. Not sure Spark UI
>>>> will
>>>> > provide the same, haven't used at all.)
>>>> >
>>>> > It will tell which thread is being blocked (even it's shown as
>>>> running) and
>>>> > which point to look at.
>>>> >
>>>> > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijing09@gmail.com>
>>>> wrote:
>>>> >
>>>> > > Once I do. thread dump, what should I be looking for to tell where
>>>> it is
>>>> > > hanging? Seeing a lot of timed_waiting and waiting on driver.
>>>> Driver is
>>>> > > also being blocked by spark UI. If there are no tasks, is there
a
>>>> point to
>>>> > > do thread dump of executors?
>>>> > >
>>>> > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <
>>>> gabor.g.somogyi@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > >> The simplest way is to do thread dump which doesn't require
any
>>>> fancy
>>>> > >> tool (it's available on Spark UI).
>>>> > >> Without thread dump it's hard to say anything...
>>>> > >>
>>>> > >>
>>>> > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe
>>>> <janethorpe1@aol.com.invalid>
>>>> > >> wrote:
>>>> > >>
>>>> > >>> Here a is another tool I use Logic Analyser  7:55
>>>> > >>> https://youtu.be/LnzuMJLZRdU
>>>> > >>>
>>>> > >>> you could take some suggestions for improving performance
>>>> queries.
>>>> > >>>
>>>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
>>>> > >>>
>>>> > >>>
>>>> > >>> Jane thorpe
>>>> > >>> janethorpe1@aol.com
>>>> > >>>
>>>> > >>>
>>>> > >>> -----Original Message-----
>>>> > >>> From: jane thorpe <janethorpe1@aol.com.INVALID>
>>>> > >>> To: janethorpe1 <janethorpe1@aol.com>; mich.talebzadeh
<
>>>> > >>> mich.talebzadeh@gmail.com>; liruijing09 <liruijing09@gmail.com>;
>>>> user <
>>>> > >>> user@spark.apache.org>
>>>> > >>> Sent: Mon, 13 Apr 2020 8:32
>>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does
nothing
>>>> Removing
>>>> > >>> Guess work from trouble shooting
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> This tool may be useful for you to trouble shoot your problems
>>>> away.
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
>>>> > >>>
>>>> > >>>
>>>> > >>> "APM tools typically use a waterfall-type view to show
the
>>>> blocking
>>>> > >>> time of different components cascading through the control
flow
>>>> within an
>>>> > >>> application.
>>>> > >>> These types of visualizations are useful, and AppOptics
has them,
>>>> but
>>>> > >>> they can be difficult to understand for those of us without
a
>>>> PhD."
>>>> > >>>
>>>> > >>> Especially  helpful if you want to understand through
>>>> visualisation and
>>>> > >>> you do not have a phD.
>>>> > >>>
>>>> > >>>
>>>> > >>> Jane thorpe
>>>> > >>> janethorpe1@aol.com
>>>> > >>>
>>>> > >>>
>>>> > >>> -----Original Message-----
>>>> > >>> From: jane thorpe <janethorpe1@aol.com.INVALID>
>>>> > >>> To: mich.talebzadeh <mich.talebzadeh@gmail.com>;
liruijing09 <
>>>> > >>> liruijing09@gmail.com>; user <user@spark.apache.org>
>>>> > >>> CC: user <user@spark.apache.org>
>>>> > >>> Sent: Sun, 12 Apr 2020 4:35
>>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does
nothing
>>>> > >>>
>>>> > >>> You seem to be implying the error is intermittent.
>>>> > >>> You seem to be implying data is being ingested  via JDBC.
So the
>>>> > >>> connection has proven itself to be working unless no data
is
>>>> arriving from
>>>> > >>> the  JDBC channel at all.  If no data is arriving then
one could
>>>> say it
>>>> > >>> could be  the JDBC.
>>>> > >>> If the error is intermittent  then it is likely a resource
>>>> involved in
>>>> > >>> processing is filling to capacity.
>>>> > >>> Try reducing the data ingestion volume and see if that
completes,
>>>> then
>>>> > >>> increase the data ingested  incrementally.
>>>> > >>> I assume you have  run the job on small amount of data
so you have
>>>> > >>> completed your prototype stage successfully.
>>>> > >>>
>>>> > >>> ------------------------------
>>>> > >>> On Saturday, 11 April 2020 Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com>
>>>> > >>> wrote:
>>>> > >>> Hi,
>>>> > >>>
>>>> > >>> Have you checked your JDBC connections from Spark to Oracle.
What
>>>> is
>>>> > >>> Oracle saying? Is it doing anything or hanging?
>>>> > >>>
>>>> > >>> set pagesize 9999
>>>> > >>> set linesize 140
>>>> > >>> set heading off
>>>> > >>> select SUBSTR(name,1,8) || ' sessions as on
>>>> '||TO_CHAR(CURRENT_DATE,
>>>> > >>> 'MON DD YYYY HH:MI AM') from v$database;
>>>> > >>> set heading on
>>>> > >>> column spid heading "OS PID" format a6
>>>> > >>> column process format a13 heading "Client ProcID"
>>>> > >>> column username  format a15
>>>> > >>> column sid       format 999
>>>> > >>> column serial#   format 99999
>>>> > >>> column STATUS    format a3 HEADING 'ACT'
>>>> > >>> column last      format 9,999.99
>>>> > >>> column TotGets   format 999,999,999,999 HEADING 'Logical
I/O'
>>>> > >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
>>>> > >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
>>>> > >>> --
>>>> > >>> SELECT
>>>> > >>>           substr(a.username,1,15) "LOGIN"
>>>> > >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5)
AS
>>>> > >>> "SID/serial#"
>>>> > >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED
IN SINCE"
>>>> > >>>         , substr(a.machine,1,10) HOST
>>>> > >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5)
"OS PID"
>>>> > >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5)
>>>> "Client PID"
>>>> > >>>         , substr(a.program,1,15) PROGRAM
>>>> > >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
>>>> > >>>         , (
>>>> > >>>                 select round(sum(ss.value)/1024) from v$sesstat
>>>> ss,
>>>> > >>> v$statname sn
>>>> > >>>                 where ss.sid = a.sid and
>>>> > >>>                         sn.statistic# = ss.statistic# and
>>>> > >>>                         -- sn.name in ('session pga memory')
>>>> > >>>                         sn.name in ('session pga
>>>> memory','session uga
>>>> > >>> memory')
>>>> > >>>           ) AS total_memory
>>>> > >>>         , (b.block_gets + b.consistent_gets) TotGets
>>>> > >>>         , b.physical_reads phyRds
>>>> > >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N')
STATUS
>>>> > >>>         , CASE WHEN a.sid in (select sid from v$mystat
where
>>>> rownum = 1)
>>>> > >>> THEN '<-- YOU' ELSE ' ' END "INFO"
>>>> > >>> FROM
>>>> > >>>          v$process p
>>>> > >>>         ,v$session a
>>>> > >>>         ,v$sess_io b
>>>> > >>> WHERE
>>>> > >>> a.paddr = p.addr
>>>> > >>> AND p.background IS NULL
>>>> > >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum
= 1)
>>>> > >>> AND a.sid = b.sid
>>>> > >>> AND a.username is not null
>>>> > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
>>>> > >>> --AND CURRENT_DATE - logon_time > 0
>>>> > >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)
 --
>>>> > >>> exclude me
>>>> > >>> --AND (b.block_gets + b.consistent_gets) > 0
>>>> > >>> ORDER BY a.username;
>>>> > >>> exit
>>>> > >>>
>>>> > >>> HTH
>>>> > >>>
>>>> > >>> Dr Mich Talebzadeh
>>>> > >>>
>>>> > >>> LinkedIn *
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> > >>> <
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> >*
>>>> > >>>
>>>> > >>> http://talebzadehmich.wordpress.com
>>>> > >>>
>>>> > >>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>> for
>>>> > >>> any loss, damage or destruction of data or any other property
>>>> which may
>>>> > >>> arise from relying on this email's technical content is
explicitly
>>>> > >>> disclaimed. The author will in no case be liable for any
monetary
>>>> damages
>>>> > >>> arising from such loss, damage or destruction.
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijing09@gmail.com>
>>>> wrote:
>>>> > >>>
>>>> > >>> Hi all,
>>>> > >>>
>>>> > >>> I am on spark 2.4.4 and using scala 2.11.12, and running
cluster
>>>> mode on
>>>> > >>> mesos. I am ingesting from an oracle database using
>>>> spark.read.jdbc. I am
>>>> > >>> seeing a strange issue where spark just hangs and does
nothing,
>>>> not
>>>> > >>> starting any new tasks. Normally this job finishes in 30
stages
>>>> but
>>>> > >>> sometimes it stops at 29 completed stages and doesn’t
start the
>>>> last stage.
>>>> > >>> The spark job is idling and there is no pending or active
task.
>>>> What could
>>>> > >>> be the problem? Thanks.
>>>> > >>> --
>>>> > >>> Cheers,
>>>> > >>> Ruijing Li
>>>> > >>>
>>>> > >>> --
>>>> > > Cheers,
>>>> > > Ruijing Li
>>>> > >
>>>>
>>> --
>>> Cheers,
>>> Ruijing Li
>>>
>> --
>> Cheers,
>> Ruijing Li
>>
> --
> Cheers,
> Ruijing Li
>

Mime
View raw message