hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11223) Offer a read-only conf alternative to new Configuration()
Date Wed, 18 Jan 2017 15:42:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828293#comment-15828293

Jason Lowe commented on HADOOP-11223:

bq. If the real problem is reloading all those XML files all the time, why not change that
behavior in Hadoop 3.x? At the very least, we could have some kind of mapping between classpath
and default configuration values, and only actually load the XML files when we saw a new classpath
which might cause us to load some different files.

That's an interesting idea.  Tackling the *-default.xml files would get us a long way since
hopefully we can not only avoid parsing them for new Configuration objects but also avoid
invalidating them in every existing Configuration object every time a new default resource
is added.  There'd still be the parsing of *-site.xml files which can also be expensive. 
We'd have to not only snapshot the classpath but also sizes and modification timestamps of
the relevant resources located on that classpath if we wanted to apply a similar approach
to those.

> Offer a read-only conf alternative to new Configuration()
> ---------------------------------------------------------
>                 Key: HADOOP-11223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11223
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>            Reporter: Gopal V
>            Assignee: Varun Saxena
>              Labels: Performance
>         Attachments: HADOOP-11223.001.patch
> new Configuration() is called from several static blocks across Hadoop.
> This is incredibly inefficient, since each one of those involves primarily XML parsing
at a point where the JIT won't be triggered & interpreter mode is essentially forced on
the JVM.
> The alternate solution would be to offer a {{Configuration::getDefault()}} alternative
which disallows any modifications.
> At the very least, such a method would need to be called from 
> # org.apache.hadoop.io.nativeio.NativeIO::<clinit>()
> # org.apache.hadoop.security.SecurityUtil::<clinit>()
> # org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider::<clinit>

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message