mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: LanczosSolver Very Slow
Date Sun, 01 May 2011 01:17:32 GMT
And are we going to stop using 'instanceof'?

On Sat, Apr 30, 2011 at 3:34 PM, Paul Mahon <pmahon@decarta.com> wrote:
> Yep, I can confirm, it took me about 23 minutes to run it this time.
>
> On 04/30/2011 12:24 PM, Jake Mannix wrote:
>>
>> Paul, I ran your data this morning (on my laptop, pseudo distributed mode)
>>
>> INFO: Program took 1388093 ms
>>
>> Yay!
>>
>>   -jake
>>
>> On Fri, Apr 29, 2011 at 10:24 PM, Jake Mannix<jake.mannix@gmail.com>
>>  wrote:
>>
>>> Great, let me know what you see.  I just checked in the last cleanup
>>> stuff.
>>>
>>>
>>> On Fri, Apr 29, 2011 at 9:55 PM, Paul Mahon<pmahon@decarta.com>  wrote:
>>>
>>>> That's for the quick work tracking this down. I'll give it another try
>>>> and
>>>> see if it's faster now.
>>>>
>>>>
>>>> On 04/29/2011 09:46 PM, Jake Mannix wrote:
>>>>
>>>>> The commit I made a few minutes ago resolving MAHOUT-639 most likely
>>>>> has
>>>>> fixed this, but there are
>>>>> a few cleanup issues left coming shortly (instanceof DenseVector /
>>>>> instanceof
>>>>> SequentialAccessSparseVector, at places where isDense() should be used,
>>>>> in
>>>>> case of NamedVector
>>>>> showing up), which avoid too many extra vector copies.
>>>>>
>>>>> Paul, try trunk again?
>>>>>
>>>>>   -jake
>>>>>
>>>>> On Fri, Apr 29, 2011 at 2:32 PM, Jake Mannix<jake.mannix@gmail.com>
>>>>>  wrote:
>>>>>
>>>>>  Ok, I think I've found it.  It was this issue, and another similar
>>>>> one.
>>>>>>
>>>>>>  Going to verify in a bit, and patch
>>>>>> coming later this afternoon!
>>>>>>
>>>>>>   -jake
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 1:42 PM, Jake Mannix<jake.mannix@gmail.com
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>  I think this is actually a bug (or series of bugs) somewhere, not
a
>>>>>>>
>>>>>>> configuration issue, as shown by the fact
>>>>>>> that LanczosSolver finishes just fine in 10 min.
>>>>>>>
>>>>>>> I'm looking into it.  I've already found one problem (TimesSquaredJob
>>>>>>> checks instanceOf DenseVector instead
>>>>>>> of calling isDense(), which gets the wrong answer for NamedVector
>>>>>>> instances), seeing if this is the slowdown
>>>>>>> cause now.
>>>>>>>
>>>>>>>   -jake
>>>>>>>
>>>>>>> On Fri, Apr 29, 2011 at 1:08 PM, Ken Krugler<
>>>>>>> kkrugler_lists@transpac.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> One random guess here - the total in-use memory for all tasks
means
>>>>>>>> you've run out of physical memory, and are using swap.
>>>>>>>>
>>>>>>>> E.g. the use of JVM memory by hash maps is one way this can
happen.
>>>>>>>>
>>>>>>>> Once that occurs, your performance will become effectively
0, as
>>>>>>>> tasks
>>>>>>>> compete with each other for memory.
>>>>>>>>
>>>>>>>> To see if this is happening, run 'top' on a slave and see
how much
>>>>>>>> swap
>>>>>>>> is being used.
>>>>>>>>
>>>>>>>> -- Ken
>>>>>>>>
>>>>>>>> On Apr 29, 2011, at 11:49am, Paul Mahon wrote:
>>>>>>>>
>>>>>>>>  I did dump the stack a few times. Here are a couple which
came up
>>>>>>>>>
>>>>>>>>> most
>>>>>>>>>
>>>>>>>> often:
>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.indexOfKey(OpenIntDoubleHashMap.java:285)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.map.OpenIntDoubleHashMap.get(OpenIntDoubleHashMap.java:202)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.getQuick(RandomAccessSparseVector.java:129)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:527)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:526)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:236)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>> "main" prio=10 tid=0x082cb800 nid=0x3060 runnable [0xb735a000]
>>>>>>>>>   java.lang.Thread.State: RUNNABLE
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:192)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.scale(TimesSquaredJob.java:241)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:232)
>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.mahout.math.hadoop.TimesSquaredJob$TimesSquaredMapper.map(TimesSquaredJob.java:194)
>>>>>>>>
>>>>>>>>>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>>>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>>>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/29/2011 10:42 AM, Paul Mahon wrote:
>>>>>>>>>
>>>>>>>>>> The task logs don't have anything useable:
>>>>>>>>>> *_syslog logs_*
>>>>>>>>>>
>>>>>>>>>> 2011-04-28 15:29:17,922 INFO
>>>>>>>>>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>>>>>>
>>>>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>>>> 2011-04-28 15:29:18,114 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> numReduceTasks: 50
>>>>>>>>> 2011-04-28 15:29:18,124 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> io.sort.mb = 100
>>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask:
data
>>>>>>>>> buffer = 79691776/99614720
>>>>>>>>> 2011-04-28 15:29:18,266 INFO org.apache.hadoop.mapred.MapTask:
>>>>>>>>> record
>>>>>>>>> buffer = 262144/327680
>>>>>>>>> The driver log isn't very interesting either:
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Job complete:
>>>>>>>>>>
>>>>>>>>> job_201104221106_0128
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient: Counters: 19
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Job Counters
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched
reduce
>>>>>>>>>> tasks=50
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Rack-local
map
>>>>>>>>>> tasks=8
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Launched
map tasks=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>> tasks=2
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   FileSystemCounters
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>> FILE_BYTES_READ=201387210
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>> HDFS_BYTES_READ=855491954
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>>>
>>>>>>>>> FILE_BYTES_WRITTEN=402789200
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:
>>>>>>>>> HDFS_BYTES_WRITTEN=20143593
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce
input groups=1
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine
output
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map
input
>>>>>>>>>>
>>>>>>>>> records=12259359
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce
shuffle
>>>>>>>>> bytes=181251087
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce
output
>>>>>>>>> records=1
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Spilled
Records=20
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map
output
>>>>>>>>>>
>>>>>>>>> bytes=201386850
>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map input
>>>>>>>>>>
>>>>>>>>>> bytes=654091926
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Combine
input
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Map
output records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>> records=10
>>>>>>>>>> 11/04/28 14:26:45 INFO lanczos.LanczosSolver: 9 passes
through the
>>>>>>>>>>
>>>>>>>>> corpus so far...
>>>>>>>>> 11/04/28 14:26:53 INFO lanczos.LanczosSolver: Lanczos
iteration
>>>>>>>>> complete - now to diagonalize the tri-diagonal auxiliary
matrix.
>>>>>>>>> 11/04/28 14:26:58 INFO lanczos.LanczosSolver: Eigenvector
0 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 0.0
>>>>>>>>> 11/04/28 14:27:02 INFO lanczos.LanczosSolver: Eigenvector
1 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 2.7694996270467187
>>>>>>>>> 11/04/28 14:27:06 INFO lanczos.LanczosSolver: Eigenvector
2 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 124.51350651732557
>>>>>>>>> 11/04/28 14:27:10 INFO lanczos.LanczosSolver: Eigenvector
3 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 254.94694717051294
>>>>>>>>> 11/04/28 14:27:15 INFO lanczos.LanczosSolver: Eigenvector
4 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 387.54002942185565
>>>>>>>>> 11/04/28 14:27:19 INFO lanczos.LanczosSolver: Eigenvector
5 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 501.70725582316834
>>>>>>>>> 11/04/28 14:27:23 INFO lanczos.LanczosSolver: Eigenvector
6 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 614.8114660750927
>>>>>>>>> 11/04/28 14:27:28 INFO lanczos.LanczosSolver: Eigenvector
7 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 700.1694351007611
>>>>>>>>> 11/04/28 14:27:32 INFO lanczos.LanczosSolver: Eigenvector
8 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 775.2520850269659
>>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: Eigenvector
9 found
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>> eigenvalue 39441.64442962535
>>>>>>>>> 11/04/28 14:27:36 INFO lanczos.LanczosSolver: LanczosSolver
>>>>>>>>> finished.
>>>>>>>>>>
>>>>>>>>>> 11/04/28 14:27:36 INFO decomposer.DistributedLanczosSolver:
>>>>>>>>>> Persisting
>>>>>>>>>>
>>>>>>>>> 10 eigenVectors and eigenValues to: ./svd/rawEigenvectors
>>>>>>>>> 11/04/28 14:29:11 INFO mapred.FileInputFormat: Total
input paths to
>>>>>>>>> process : 1
>>>>>>>>> 11/04/28 14:29:12 INFO mapred.JobClient: Running job:
>>>>>>>>> job_201104221106_0129
>>>>>>>>> 11/04/28 14:29:13 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>
>>>>>>>>>> 11/04/28 16:08:23 INFO mapred.JobClient:  map 1%
reduce 0%
>>>>>>>>>> ...
>>>>>>>>>> 11/04/29 05:32:24 INFO mapred.JobClient:  map 9%
reduce 0%
>>>>>>>>>> 11/04/29 07:18:38 INFO mapred.JobClient:  map 10%
reduce 0%
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The counters don't mean much to me either. Here's
one set from an
>>>>>>>>>>
>>>>>>>>> incomplete mapper:
>>>>>>>>> *FileSystemCounters*
>>>>>>>>>>
>>>>>>>>>>    HDFS_BYTES_READ     24,540,497
>>>>>>>>>>
>>>>>>>>>> *Map-Reduce Framework*
>>>>>>>>>>
>>>>>>>>>>    Combine output records     0
>>>>>>>>>>
>>>>>>>>>>    Map input records     115,815
>>>>>>>>>>
>>>>>>>>>>    Spilled Records     0
>>>>>>>>>>
>>>>>>>>>>    Map output bytes     0
>>>>>>>>>>
>>>>>>>>>>    Map input bytes     4,401,563
>>>>>>>>>>
>>>>>>>>>>    Combine input records     0
>>>>>>>>>>
>>>>>>>>>>    Map output records     0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 04/29/2011 10:28 AM, Jake Mannix wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 29, 2011 at 9:45 AM, Paul Mahon<pmahon@decarta.com>
>>>>>>>>>>>
>>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>  I'm running the LanczosSolver (from SVN two days ago)
it it's
>>>>>>>>> taking
>>>>>>>>>>>
>>>>>>>>>>> (what
>>>>>>>>>
>>>>>>>>>  I think) is a very long time to run. The matrix I'm
using is very
>>>>>>>>>>>
>>>>>>>>>>> sparse.
>>>>>>>>>
>>>>>>>>>  The command I used was:
>>>>>>>>>>>>
>>>>>>>>>>>> bin/mahour svd --input ./base --output ./svd
--numRows 12259359
>>>>>>>>>>>>
>>>>>>>>>>> --numCols
>>>>>>>>>
>>>>>>>>>  2517335 --rank 10 --cleansvd true
>>>>>>>>> -Dmapred.child.java.opts=-Xmx2048m
>>>>>>>>>>>>
>>>>>>>>>>>> The first part of the program runs pretty
quickly, and in less
>>>>>>>>>>>> than
>>>>>>>>>>>>
>>>>>>>>>>> 10
>>>>>>>>>
>>>>>>>>>  minutes it has found 10 eigenvectors. The second step
(cleansvd)
>>>>>>>>> has
>>>>>>>>>>>
>>>>>>>>>>> been
>>>>>>>>>
>>>>>>>>>  running for 15 hours and is only 10% of the way through
the map
>>>>>>>>>>>
>>>>>>>>>>> tasks.
>>>>>>>>>
>>>>>>>>> 10 minutes for 10 eigenvectors is totally reasonable,
depending on
>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>>>
>>>>>>>>> sparse this matrix is.  cleansvd should take roughly
the *same*
>>>>>>>>>>
>>>>>>>>>> amount of
>>>>>>>>>
>>>>>>>>> time as the original solving, because it's basically
just taking
>>>>>>>>> each
>>>>>>>>>>
>>>>>>>>>> of the
>>>>>>>>>
>>>>>>>>> vectors and re-multiplying them by the original matrix
to see how
>>>>>>>>>>
>>>>>>>>>> close to
>>>>>>>>>
>>>>>>>>> actual eigenvectors they are.  Do you have any log output
to share?
>>>>>>>>>>
>>>>>>>>>>  I can
>>>>>>>>>
>>>>>>>>> try to reproduce this on a similarly sized matrix, but
this
>>>>>>>>> defintely
>>>>>>>>>>
>>>>>>>>>> seems
>>>>>>>>>
>>>>>>>>> wrong.
>>>>>>>>>>>
>>>>>>>>>>>  Is this performance expected? Is there something
I can do to
>>>>>>>>>>> speed
>>>>>>>>>>> it up?
>>>>>>>>>
>>>>>>>>> No, and probably not.  Other than help me find the bug
that's
>>>>>>>>> causing
>>>>>>>>>>
>>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>>  I've never seen that behavior before.
>>>>>>>>>>>
>>>>>>>>>>>   -jake
>>>>>>>>>>>
>>>>>>>>>>>  --------------------------
>>>>>>>>
>>>>>>>> Ken Krugler
>>>>>>>> +1 530-210-6378
>>>>>>>> http://bixolabs.com
>>>>>>>> e l a s t i c   w e b   m i n i n g
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message