spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <>
Subject Re: About Spark job web ui persist(JIRA-969)
Date Wed, 08 Jan 2014 00:13:37 GMT
Hey Sandy,

Do you know what the status is for YARN-321 and what version of YARN
it's targeted for? Also, is there any kind of documentation or API for
this? Does it control the presentation of the data itself (e.g. it
actually has its own UI)?

@Tom - having an optional history server sounds like a good idea.

One question is what format to use for storing the data and how the
persisted format relates to XML/HTML generation in the live UI. One
idea would be to add JSON as an intermediate format inside of the
current WebUI, and then any JSON page could be persisted and rendered
by the history server using the same code. Once a SparkContext exits
it could dump a series of named paths each with a JSON file. Then the
history server could load those paths and pass them through the second
rendering stage (JSON => XML) to create each page.

It would be good if SPARK-969 had a good design doc before anyone
starts working on it.

- Patrick

On Tue, Jan 7, 2014 at 3:18 PM, Sandy Ryza <> wrote:
> As a sidenote, it would be nice to make sure that whatever done here will
> work with the YARN Application History Server (YARN-321), a generic history
> server that functions similarly to MapReduce's JobHistoryServer.  It will
> eventually have the ability to store application-specific data.
> -Sandy
> On Tue, Jan 7, 2014 at 2:51 PM, Tom Graves <> wrote:
>> I don't think you want to save the html/xml files. I would rather see the
>> info saved into a history file in like a json format that could then be
>> re-read and the web ui display the info, hopefully without much change to
>> the UI parts.  For instance perhaps the history server could read the file
>> and populate the appropriate Spark data structures that the web ui already
>> uses.
>> I would suggest making it so the history server is an optional server and
>> could be run on any node. That way if the load on a particular node becomes
>> to much it could be moved, but you also could run it on the same node as
>> the Master.  All it really needs to know is where to get the history files
>> from and have access to that location.
>> Hadoop actually has a history server for MapReduce which works very
>> similar to what I mention above.   One thing to keep in minds here is
>> security.  You want to make sure that the history files can only be read by
>> users who have the appropriate permissions.  The history server itself
>> could run as  a superuser who has permission to server up the files based
>> on the acls.
>> On Tuesday, January 7, 2014 8:06 AM, "Xia, Junluan" <>
>> wrote:
>> Hi all
>>          Spark job web ui will not be available when job is over, but it
>> is convenient for developer to debug with persisting job web ui. I just
>> come up with draft for this issue.
>> 1.       We could simply save the web page with html/xml
>> format(stages/executors/storages/environment) to certain location when job
>> finished
>> 2.       But it is not easy for user to review the job info with #1, we
>> could build extra job history service for developers
>> 3.       But where will we build this history service? In Driver node or
>> Master node?
>> Any suggestions about this improvement?
>> regards,
>> Andrew

View raw message