mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: MAHOUT_OPTS not taking effect when running mahout locally
Date Wed, 04 Sep 2013 08:23:23 GMT
Here's what am trying to say: In most of the other projects, such as
Hadoop, Pig, Sqoop, Flume, etc., the PROJECT_OPTS is used to specify
"Additional JVM arguments" rather than application arguments. It has
been the same in Mahout too, so MAHOUT_OPTS wasn't ever intended to be
a way to pass application options/configs to the runtime, but rather
to control heap space/system properties/etc..

The change you're proposing moves it AFTER the class invocation, which
would break other uses relying on its right use today, so instead you
could introduce a new env-var MAHOUT_APP_OPTS which goes after the
classname and can accept all that -D generic conf params.

On Sun, Sep 1, 2013 at 4:06 AM, Mario Rodriguez <mario.rodmag@gmail.com> wrote:
> What I'm passing in MAHOUT_OPTS are parameters of the same nature of those
> being set in bin/mahout:
>
> MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.dir=$MAHOUT_LOG_DIR"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.file=$MAHOUT_LOGFILE"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=512MB"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.child.java.opts=-Xmx4096m"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.child.java.opts=-Xmx4096m"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.output.compress=true"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.compress.map.output=true"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=1"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=1"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.factor=30"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.mb=1024"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dio.file.buffer.size=32786"
>
>
> I have a beefy dev box, and so can afford to tune those values.
>
> In the current exec call, those parameters are not considered in the tasks
> being launched by org.apache.mahout.driver.MahoutDriver.
>
> I can look at this in more detail when Im back in the office on monday and
> submit a JIRA ticket and patch (depending on how involved the right fix
> turns out to be).
>
> Cheers,
>
> Mario
>
>>
>>
>> On Sat, Aug 31, 2013 at 2:34 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>>> I don't quite know what its used for, but that order change can be
>>> considered incompatible, mainly cause in its current form it is (and
>>> doubles up) applying directly to the JVM that launches Mahout, but the
>>> changed form makes it into application-only arguments.
>>>
>>> On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <gkhncpn@gmail.com> wrote:
>>> > Hi Mario,
>>> >
>>> > Could you create a JIRA ticket for that, and submit your diff as a
>>> patch if
>>> > possible?
>>> > http://issues.apache.org/jira/browse/MAHOUT
>>> >
>>> > Best,
>>> > Gokhan
>>> >
>>> >
>>> > On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <
>>> mario.rodmag@gmail.com>wrote:
>>> >
>>> >> Hi everyone,
>>> >>
>>> >> It seems MAHOUT_OPTS is not getting picked up when running mahout
>>> locally
>>> >> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
>>> >> MAHOUT_OPTS is passed in bin/mahout from:
>>> >>
>>> >> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
>>> >> "$@"
>>> >>
>>> >> to:
>>> >>
>>> >> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
>>> >> $MAHOUT_OPTS
>>> >>
>>> >>
>>> >> I cant guarantee it wont break some other way of running it; it does
>>> not
>>> >> look like it will, but I have not tested it.
>>> >>
>>> >> Cheers,
>>> >>
>>> >> Mario
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>



-- 
Harsh J

Mime
View raw message