giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <>
Subject Re: [jira] [Commented] (GIRAPH-315) giraph-site.xml isn't read on time
Date Tue, 11 Sep 2012 19:28:09 GMT
Interesting. Here's what I know from struggling with GIRAPH-214.

Configurations get created with "new" but are populated in constructor by
static values stored when the class is loaded and the static addResource()
blocks are called on various Hadoop and Giraph config files. Once these are
populated, any "new Configuration()" call will be auto-populated with the
values read in the static blocks as far as I know.

In GiraphRunner, GiraphJob is definitely created before any threads on any
mappers are run, master or not. As soon as the GiraphJob class is loaded,
the Configuration class already populated with Hadoop constants from its
conf files is additionally populated with "giraph-site.xml" values and -D
command line opts passed into bin/giraph, and then the Configuration
instance with all that data is wrapped in GiraphJob, which is wrapped in
GiraphRunner, which is passed to Hadoop for worker/master Mappers to get
greated from them when the call to ToolRunner is made at the bottom of
GiraphRunner. So we should be good there.

However, in various unit tests all of this is done/not done or re-ordered
from the above scenario in various ways, and some magic goes on because
GiraphRunner inherits Tool. In the tests where we just create a one-off
Configuration or one-off GiraphJob, funny stuff might occur. I think this
is also why both GiraphRunner and GiraphJob have the static block, to avoid
strange chicken-and-egg issues in GiraphRunner's Configuration.

Thats about all I know. I am working on a couple patches today but at least
one (thanks for the help) is for Giraph and will include getting my home
setup running so maybe I can reproduce this and figure out what the deal
is. Giraph's interactions with the Hadoop framework have always provided me
with many amusing hours of frustration and I expect that relationship to
continue ;)

On Tue, Sep 11, 2012 at 5:44 AM, Maja Kabiljo (JIRA) <>wrote:

>     [
> Maja Kabiljo commented on GIRAPH-315:
> -------------------------------------
> Yes, it is happening, but just in tests.
> Here are a few things I noticed while investigating this, maybe they'll
> make more sense to you since you worked a lot with Configuration already.
> For example while in BspCase.setupConfiguration confgurations from other
> files (i.e. core-site.xml, mapred-default.xml...) are not visible either, I
> guess it looks for them in different folder.
> On master static block of GiraphJob gets called some steps after
> is called (this happens also when running benchmarks and
> examples). But it seems like when we execute "hadoop jar ..." it gets
> called somewhere before also, not sure where.
> In tests config object has the same resources as the one we get with new
> Configuration(), which is otherwise not the case.
> > giraph-site.xml isn't read on time
> > ----------------------------------
> >
> >                 Key: GIRAPH-315
> >                 URL:
> >             Project: Giraph
> >          Issue Type: Bug
> >            Reporter: Maja Kabiljo
> >            Assignee: Maja Kabiljo
> >            Priority: Trivial
> >         Attachments: GIRAPH-315.patch
> >
> >
> > While running some tests I noticed that on the master I get different
> values of some configuration parameters in the beginning of the execution
> than later on. It turned out that giraph-site.xml gets added as default
> resource a bit later than it should.
> > This only happens when running tests.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message