mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lastarsenal <lastarse...@163.com>
Subject Re:Re:Re: Re: Hadoop SSVD OutOfMemory Problem
Date Wed, 29 Apr 2015 03:07:30 GMT
Oh, I search the solution, so I refered to this page:
http://maven.apache.org/surefire/maven-surefire-plugin/examples/single-test.html


However, I run "mvn -Dtest=LocalSSVDPCASparseTest test" and got a error as below:


[INFO] Mahout Build Tools ................................. FAILURE [  1.309 s]
[INFO] Apache Mahout ...................................... SKIPPED
[INFO] Mahout Math ........................................ SKIPPED
[INFO] Mahout HDFS ........................................ SKIPPED
[INFO] Mahout Map-Reduce .................................. SKIPPED
[INFO] Mahout Integration ................................. SKIPPED
[INFO] Mahout Examples .................................... SKIPPED
[INFO] Mahout Math Scala bindings ......................... SKIPPED
[INFO] Mahout H2O backend ................................. SKIPPED
[INFO] Mahout Spark bindings .............................. SKIPPED
[INFO] Mahout Spark bindings shell ........................ SKIPPED
[INFO] Mahout Release Package ............................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.916 s
[INFO] Finished at: 2015-04-29T11:08:18+08:00
[INFO] Final Memory: 29M/965M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test)
on project mahout-buildtools: No tests were executed!  (Set -DfailIfNoTests=false to ignore
this error.) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following
articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException


How to fix it?

At 2015-04-29 10:54:35, "lastarsenal" <lastarsenal@163.com> wrote:
>Ok, I have github account and clone mahout in my local workdir. 
>
>
>I revised the code and run test: mvn test, however, there are 3 test failure:
>Failed tests: 
>  LocalSSVDPCASparseTest.runPCATest1:87->runSSVDSolver:222->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86
null
>  LocalSSVDSolverDenseTest.testSSVDSolverPowerIterations1:59->runSSVDSolver:172->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86
null
>  LocalSSVDSolverSparseSequentialTest.testSSVDSolverPowerIterations1:69->runSSVDSolver:177->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86
null
>
>
>Now, my question is, how can I run a specified test with maven? For "mvn test" is so slow,
then if I can do like "mvn test LocalSSVDPCASparseTest", my efficiency will be improved.
>
>At 2015-04-29 01:25:34, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
>>Just Dmitriy is fine.
>>
>>In order to create a pull request, please check out the process page
>>http://mahout.apache.org/developers/github.html. Note that it is written
>>for both committers and contributors, so you need to ignore the details for
>>committers.
>>
>>Basically, you just need a github account, clone (fork) apache/mahout in
>>your account, (optionally) create a patch branch, commit your modifications
>>there, and then use github UI to create a pull request against
>>apache/mahout.
>>
>>thanks.
>>
>>-d
>>
>>On Mon, Apr 27, 2015 at 8:39 PM, lastarsenal <lastarsenal@163.com> wrote:
>>
>>> Hi, Dmitriy Lyubimov
>>>
>>>
>>> OK, I have submitted a JIRA issue at
>>> https://issues.apache.org/jira/browse/MAHOUT-1700
>>>
>>>
>>> I'm a newbie for mahout, so, what should I do next for this issue? Thank
>>> you!
>>>
>>> At 2015-04-28 02:16:37, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
>>> >Thank you for this analysis. I can't immediately confirm this since it's
>>> >been a while but this sounds credible.
>>> >
>>> >Do you mind to file a jira with all this information, and even perhaps do
>>> a
>>> >PR on github?
>>> >
>>> >thank you.
>>> >
>>> >On Mon, Apr 27, 2015 at 4:32 AM, lastarsenal <lastarsenal@163.com>
wrote:
>>> >
>>> >> Hi, All,
>>> >>
>>> >>
>>> >>      Recently, I tried mahout's hadoop ssvd(mahout-0.9 or mahout-1.0)
>>> >> job. There's a java heap space out of memory problem  in
>>> ABtDenseOutJob. I
>>> >> found the reason, the ABtDenseOutJob map code is as below:
>>> >>
>>> >>
>>> >>     protected void map(Writable key, VectorWritable value, Context
>>> context)
>>> >>       throws IOException, InterruptedException {
>>> >>
>>> >>
>>> >>       Vector vec = value.get();
>>> >>
>>> >>
>>> >>       int vecSize = vec.size();
>>> >>       if (aCols == null) {
>>> >>         aCols = new Vector[vecSize];
>>> >>       } else if (aCols.length < vecSize) {
>>> >>         aCols = Arrays.copyOf(aCols, vecSize);
>>> >>       }
>>> >>
>>> >>
>>> >>       if (vec.isDense()) {
>>> >>         for (int i = 0; i < vecSize; i++) {
>>> >>           extendAColIfNeeded(i, aRowCount + 1);
>>> >>           aCols[i].setQuick(aRowCount, vec.getQuick(i));
>>> >>         }
>>> >>       } else if (vec.size() > 0) {
>>> >>         for (Vector.Element vecEl : vec.nonZeroes()) {
>>> >>           int i = vecEl.index();
>>> >>           extendAColIfNeeded(i, aRowCount + 1);
>>> >>           aCols[i].setQuick(aRowCount, vecEl.get());
>>> >>         }
>>> >>       }
>>> >>       aRowCount++;
>>> >>     }
>>> >>
>>> >>
>>> >> If the input is RandomAccessSparseVector, usually with big data, it's
>>> >> vec.size() is Integer.MAX_VALUE, which is 2^31, then aCols = new
>>> >> Vector[vecSize] will introduce the OutOfMemory problem. The settlement
>>> of
>>> >> course should be enlarge every tasktracker's maximum memory:
>>> >> <property>
>>> >>   <name>mapred.child.java.opts</name>
>>> >>   <value>-Xmx1024m</value>
>>> >> </property>
>>> >> However, if you are NOT hadoop administrator or ops, you have no
>>> >> permission to modify the config. So, I try to modify ABtDenseOutJob
map
>>> >> code to support RandomAccessSparseVector situation, I use hashmap to
>>> >> represent aCols instead of the original Vector[] aCols array, the
>>> modified
>>> >> code is as below:
>>> >>
>>> >>
>>> >> private Map<Integer, Vector> aColsMap = new HashMap<Integer,
Vector>();
>>> >>     protected void map(Writable key, VectorWritable value, Context
>>> context)
>>> >>       throws IOException, InterruptedException {
>>> >>
>>> >>
>>> >>       Vector vec = value.get();
>>> >>       if (vec.isDense()) {
>>> >>         for (int i = 0; i < vecSize; i++) {
>>> >>           //extendAColIfNeeded(i, aRowCount + 1);
>>> >>           if (aColsMap.get(i) == null) {
>>> >>          aColsMap.put(i, new RandomAccessSparseVector(Integer.MAX_VALUE,
>>> >> 100));
>>> >>           }
>>> >>           aColsMap.get(i).setQuick(aRowCount, vec.getQuick(i));
>>> >>           //aCols[i].setQuick(aRowCount, vec.getQuick(i));
>>> >>         }
>>> >>       } else if (vec.size() > 0) {
>>> >>         for (Vector.Element vecEl : vec.nonZeroes()) {
>>> >>           int i = vecEl.index();
>>> >>           //extendAColIfNeeded(i, aRowCount + 1);
>>> >>           if (aColsMap.get(i) == null) {
>>> >>          aColsMap.put(i, new RandomAccessSparseVector(Integer.MAX_VALUE,
>>> >> 100));
>>> >>           }
>>> >>           aColsMap.get(i).setQuick(aRowCount, vecEl.get());
>>> >>           //aCols[i].setQuick(aRowCount, vecEl.get());
>>> >>         }
>>> >>       }
>>> >>       aRowCount++;
>>> >>     }
>>> >>
>>> >>
>>> >> Then the OutofMemory problem is dismissed.
>>> >>
>>> >>
>>> >> Thank you!
>>> >>
>>> >>
>>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message