crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <>
Subject Re: About status web page
Date Wed, 27 Feb 2013 04:03:31 GMT
Yes, it is for debugging and monitoring.

I'm developing a complex pipeline (30+ MRs plus lots of joins). I have a
hard time to understand which part of the pipeline spends most running time
and how much intermediate output does it produce. Crunch's optimization
work is great, but it makes the execution plan difficult to be understood.
Each time I modified the pipeline, I have to dump the dot file and run
graphviz to generate a new picture and examine if there's anything wrong.

About security, I'm not familiar with how Hadoop does it. I will try to
reuse hadoop's HttpServer (does it have something to do with security?).
The bottom line is to make this feature disabled by default, and let users
enable it at their own risk.

If this feature is enabled, the user can choose to use unused port or
specified port. I haven't got an idea that how the user know the randomly
picked port (via log?) . I will be working on a prototype version first,
and see if the status page is generally useful.

On Wed, Feb 27, 2013 at 2:30 AM, Matthias Friedrich <> wrote:

> Hi Chao,
> sounds interesting - just a couple of things that come to mind:
> I this intended as debugging aid or for operational monitoring?
> A Crunch job is a temporary thing, to me this doesn't sound like a
> good match for a web service because it disappears after a (possibly
> short) time. Also, when multiple jobs are executed concurrently from
> the same machine, you can't work with a well-known port, you'd have to
> pick an unused port for each job.
> It also looks to me like this has security implications? Right now,
> Crunch is just a client library and we're part of Hadoop's security
> framework. A web service we might have to secure in some way.
> Regards,
>   Matthias
> On Tuesday, 2013-02-26, Chao Shi wrote:
> > Hi Crunch Devs,
> >
> > I'm interested in adding a web status page to crunch. I'm working on a
> > prototype first, which simply runs a jetty server and renders the dot
> file
> > produced by DotFileWriter at browser. The dot rendering work is done by
> > viz.js <>. It can successfully render
> the
> > plan into SVG.
> >
> > I think there are 2 issues I hit with viz.js:
> >
> > 1. The license of viz.js is unclear. It is compiled from GraphViz source
> > code with emscripten. GraphViz is Eclipse Public License 1.0.
> >
> > 2. viz.js is big and slow. It is a 1.4MB compressed JS. It takes 1 or 2
> > seconds on my laptop to render my pipeline (30+ MRs). I think it good to
> > have the graph refresh frequently and show the running status of the
> > pipeline (i.e. whether MRs are done or not). Thus the rendering time
> would
> > be too slow.
> >
> > Another approach is to call graphviz command at server side, if viz.js is
> > not possible. I can't find any pure Java implementation of graphviz.
> >
> > Looking forward to your advices.
> >
> > Thanks,
> > Chao

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message