mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Mohajerian <mohaj...@gmail.com>
Subject Re: Latent Semantic Analysis
Date Mon, 04 Jun 2012 07:11:02 GMT
So now that LSA works but clustering of two newsgroups is not accurate
based on my subjective observation. I had two questions:
1) Does it make sense to use Canopy before k-mean step to get a better idea
of the number of clusters or the output from SSVD can help in that regard?
Currently I pass the number of clusters as input parameter.
2) What is a good way to assess the accuracy of the result, is there some
data set that is already clustered with certain tuning parameter that I can
use to gain some confidence? Using Newsgroups of different topics may not
be the best input since we aren't doing a regular clustering based on word
count.

Thanks
Peyman

On Fri, Apr 6, 2012 at 1:05 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> Ok, cool.
>
> I think writing MR output into your input folder is not a good
> practice in general in Hadoop world regardless of a job. Glad you had
> it resolved.
>
> On Fri, Apr 6, 2012 at 9:55 AM, Peyman Mohajerian <mohajeri@gmail.com>
> wrote:
> > Dmitriy,
> >
> > I did downgrade my hadoop and got the same error; however your last
> > suggestion worked, I moved the output path to a whole different directory
> > and this particular problem went away.
> >
> > Thanks Much,
> > Peyman
> >
> > On Thu, Apr 5, 2012 at 12:38 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> >
> >> also i notice that you are using output as a subfolder of your input?
> >> if so, it is probably going to create some mess. If so, please don't
> >> use folders for input and output spec which are nested w.r.t. each
> >> other. This is not expected.
> >>
> >> -d
> >>
> >> On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <mohajeri@gmail.com>
> >> wrote:
> >> > Ok, great, I'll give these ideas a try later today, the input is the
> >> > following line(s) that in my code sample was commented out using ';'
> in
> >> > Clojure.
> >> >  The first stage, Q-job is done fine, it is the second job that gets
> >> messed
> >> > up, the output of Q-job is at:
> >> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and
> >> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but
> BtJob is
> >> > looking for the input in the wrong place, it must be hadoop version as
> >> you
> >> > said.
> >> >
> >> > input path  #<Path
> >> > hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120>
> >> > dd  #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208>
> >> > numCol  1000
> >> > numrow  15982
> >> >
> >> >
> >> > On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> >> wrote:
> >> >
> >> >> Another idea i have is to try to run it from just Mahout command
> line,
> >> >> see if it works with .205. If it does, it is definitely something
> >> >> about passing parameters in/client hadoop classpath/ etc.
> >> >>
> >> >> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >
> >> >> wrote:
> >> >> > also you are printing your input path -- how does it look like
in
> >> >> > reality? because this path that it complains about,
> SSVDOutput/data,
> >> >> > in fact should be the input path. That's what's perplexing.
> >> >> >
> >> >> > We are talking hadoop job setup process here, nothing specific
to
> the
> >> >> > solution itself. And job setup/directory management fails for
some
> >> >> > reason.
> >> >> >
> >> >> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> >> >> wrote:
> >> >> >> Any chance you could test it with its current dependency,
> 0.20.204?
> >> or
> >> >> >> that would be hard to stage?
> >> >> >>
> >> >> >> Newer hadoop version is frankly all i can think of here for
the
> >> reason
> >> >> of this.
> >> >> >>
> >> >> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian <
> >> mohajeri@gmail.com>
> >> >> wrote:
> >> >> >>> Hi Dmitriy,
> >> >> >>>
> >> >> >>> It is a Clojure code from:
> https://github.com/algoriffic/lsa4solr
> >> >> >>> Of course I modified it to use Mahout .6 distribution,
also
> running
> >> on
> >> >> >>> hadoop-0.20.205.0, here is the Closure code that I changed,
> >> >> >>> the lines after ' decomposer (doto (.run ssvdSolver))
' still
> need
> >> >> >>> modification b/c I'm not reading the eigenValue/Vector
from the
> >> solver
> >> >> >>> correctly.  Originally this code was based on Mahout .4.
I'm
> >> creating
> >> >> the
> >> >> >>> Matrix from Solr 3.1.0, very similar to what was done
on: '
> >> >> >>> https://github.com/algoriffic/lsa4solr'
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>>
> >> >> >>> (defn decompose-svd
> >> >> >>>  [mat k]
> >> >> >>>  ;(println "input path " (.getRowPath mat))
> >> >> >>>  ;(println "dd " (into-array [(.getRowPath mat)]))
> >> >> >>>  ;(println "numCol " (.numCols mat))
> >> >> >>>  ;(println "numrow " (.numRows mat))
> >> >> >>>  (let [eigenvalues (new java.util.ArrayList)
> >> >> >>>    eigenvectors (DenseMatrix. (+ k 2) (.numCols mat))
> >> >> >>>    numCol (.numCols mat)
> >> >> >>>        config (.getConf mat)
> >> >> >>>    rawPath (.getRowPath mat)
> >> >> >>>    outputPath (Path. (str (.toString rawPath) "/SSVD-out"))
> >> >> >>>    inputPath (into-array [rawPath])
> >> >> >>>    ssvdSolver (SSVDSolver. config inputPath outputPath
1000 k 60
> 3)
> >> >> >>>    decomposer (doto (.run ssvdSolver))
> >> >> >>>    V (normalize-matrix-columns (.viewPart (.transpose
> eigenvectors)
> >> >> >>>                           (int-array [0 0])
> >> >> >>>                           (int-array [(.numCols mat) k])))
> >> >> >>>    U (mmult mat V)
> >> >> >>>    S (diag (take k (reverse eigenvalues)))]
> >> >> >>>    {:U U
> >> >> >>>     :S S
> >> >> >>>     :V V}))
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov <
> >> dlieu.7@gmail.com>
> >> >> wrote:
> >> >> >>>
> >> >> >>>> Yeah. i don't see how it may have arrived at that
error.
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> Peyman,
> >> >> >>>>
> >> >> >>>> I need to know more -- it looks like you are using
embedded api,
> >> not a
> >> >> >>>> command line, so i need to see how you you initialize
the solver
> >> and
> >> >> >>>> also which version of Mahout libraries you are using
(your stack
> >> trace
> >> >> >>>> numbers do not correspond to anything reasonable on
current
> trunk).
> >> >> >>>>
> >> >> >>>> thanks.
> >> >> >>>>
> >> >> >>>> -d
> >> >> >>>>
> >> >> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov
<
> >> dlieu.7@gmail.com>
> >> >> >>>> wrote:
> >> >> >>>> > Hm. i never saw that and not sure where this
folder comes
> from.
> >> >> Which
> >> >> >>>> > hadoop version are you using? This may be a result
of
> >> incompatible
> >> >> >>>> > support for multiple outputs in the newer hadoop
versions . I
> >> tested
> >> >> >>>> > it with CDH3u0/u3 and it was fine. This folder
should normally
> >> >> appear
> >> >> >>>> > in the conversation, i suspect it is an internal
hadoop thing.
> >> >> >>>> >
> >> >> >>>> > This is without me actually looking at the code
per stack
> trace.
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian
<
> >> >> mohajeri@gmail.com>
> >> >> >>>> wrote:
> >> >> >>>> >> Hi Guys,
> >> >> >>>> >> I'm now using ssvd for my LSA code and get
the following
> error,
> >> at
> >> >> the
> >> >> >>>> time
> >> >> >>>> >> of error all I have under 'SSVD-out' folder:
> >> >> >>>> >> Q-job/QHat-m-00000<
> >> >> >>>>
> >> >>
> >>
> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070
> >> >> >>>> >&
> >> >> >>>> >> R-m-00000<
> >> >> >>>>
> >> >>
> >>
> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070
> >> >> >>>> >&
> >> >> >>>> >> _SUCCESS<
> >> >> >>>>
> >> >>
> >>
> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070
> >> >> >>>> >&
> >> >> >>>> >> part-m-00000.deflate<
> >> >> >>>>
> >> >>
> >>
> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070
> >> >> >>>> >
> >> >> >>>> >>
> >> >> >>>> >> I'm not clear where '/data' folder is supposed
to be set, is
> it
> >> >> part of
> >> >> >>>> the
> >> >> >>>> >> output of the QJob, I don't see any error
in the QJob*?
> >> >> >>>> >>
> >> >> >>>> >> *Thanks,*
> >> >> >>>> >> *
> >> >> >>>> >> SEVERE: java.io.FileNotFoundException: File
does not exist:
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:534)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> >> >> >>>> >>    at
> >> >> >>>>
> >> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:954)
> >> >> >>>> >>    at
> >> >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:971)
> >> >> >>>> >>    at
> >> >> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
> >> >> >>>> >>    at
> >> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
> >> >> >>>> >>    at
> >> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)
> >> >> >>>> >>    at java.security.AccessController.doPrivileged(Native
> Method)
> >> >> >>>> >>    at javax.security.auth.Subject.doAs(Subject.java:396)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >>
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)
> >> >> >>>> >>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
> >> >> >>>> >>    at
> >> >> >>>>
> >> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:505)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:347)
> >> >> >>>> >>    at
> >> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142)
> >> >> >>>> >>    at
> lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72)
> >> >> >>>> >>    at lsa4solr.cluster$_cluster.invoke(cluster.clj:103)
> >> >> >>>> >>    at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown
> >> Source)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >> >> >>>> >>    at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >> >>>>
> >> >>
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >> >> >>>> >>    at
> >> >> >>>> >>
> >> >>
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >> >> >>>> >>
> >> >> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy
Lyubimov <
> >> >> dlieu.7@gmail.com>
> >> >> >>>> wrote:
> >> >> >>>> >>
> >> >> >>>> >>> for the third time, in context of lsa,
faster and hence
> perhaps
> >> >> better
> >> >> >>>> >>> alternative to lanczos is ssvd. Is there
any specific reason
> >> you
> >> >> want
> >> >> >>>> >>> to use lanczos solver in context of LSA?
> >> >> >>>> >>>
> >> >> >>>> >>> -d
> >> >> >>>> >>>
> >> >> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM, Peyman
Mohajerian <
> >> >> mohajeri@gmail.com
> >> >> >>>> >
> >> >> >>>> >>> wrote:
> >> >> >>>> >>> > Hi Guys,
> >> >> >>>> >>> >
> >> >> >>>> >>> > Per you advice I did upgrade to
Mahout .6 and did a bunch
> of
> >> API
> >> >> >>>> >>> > changes and in the meantime realized
I had a bug with my
> >> input
> >> >> >>>> matrix,
> >> >> >>>> >>> > zero rows read from Solr b/c multiple
fields in Solr were
> >> index
> >> >> and
> >> >> >>>> >>> > not just the one I was interested
in, that issues is fixed
> >> and
> >> >> I have
> >> >> >>>> >>> > a matrix with these dimensions:
(.numCols mat) 1000
> (.numRows
> >> >> mat)
> >> >> >>>> >>> > 15932 (or the transpose)
> >> >> >>>> >>> > Unfortunately I'm getting the below
error now, in the
> context
> >> >> of some
> >> >> >>>> >>> > other Mahout algorithm there was
a mention of '/tmp' vs
> >> '/_tmp'
> >> >> >>>> >>> > causing this issue but in this particular
case the matrix
> is
> >> in
> >> >> >>>> >>> > memory!! I'm using this google package:
guava-r09.jar
> >> >> >>>> >>> >
> >> >> >>>> >>> > SEVERE: java.util.NoSuchElementException
> >> >> >>>> >>> >        at
> >> >> >>>> >>>
> >> >> >>>>
> >> >>
> >>
> com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
> >> >> >>>> >>> >        at
> >> >> >>>> >>>
> >> >> >>>>
> >> >>
> >>
> org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
> >> >> >>>> >>> >        at
> >> >> >>>> >>>
> >> >> >>>>
> >> >>
> >>
> org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
> >> >> >>>> >>> >        at
> >> >> >>>> >>>
> >> >> >>>>
> >> >>
> >>
> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
> >> >> >>>> >>> >        at
> >> >> >>>> >>>
> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165)
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> > Any suggestion?
> >> >> >>>> >>> > Thanks,
> >> >> >>>> >>> > Peyman
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> > On Mon, Feb 20, 2012 at 10:38 AM,
Dmitriy Lyubimov <
> >> >> >>>> dlieu.7@gmail.com>
> >> >> >>>> >>> wrote:
> >> >> >>>> >>> >> Peyman,
> >> >> >>>> >>> >>
> >> >> >>>> >>> >>
> >> >> >>>> >>> >> Yes, what Ted said. Please take
0.6 release. Also try
> ssvd,
> >> it
> >> >> may
> >> >> >>>> >>> >> benefit you in some regards
compared to Lanczos.
> >> >> >>>> >>> >>
> >> >> >>>> >>> >> -d
> >> >> >>>> >>> >>
> >> >> >>>> >>> >> On Sun, Feb 19, 2012 at 10:34
AM, Peyman Mohajerian <
> >> >> >>>> mohajeri@gmail.com>
> >> >> >>>> >>> wrote:
> >> >> >>>> >>> >>> Hi Dmitriy & Others,
> >> >> >>>> >>> >>>
> >> >> >>>> >>> >>> Dmitriy thanks for your
previous response.
> >> >> >>>> >>> >>> I have a follow up question
to my LSA project. I have
> >> managed
> >> >> to
> >> >> >>>> >>> >>> upload 1,500 documents from
two different news groups
> (one
> >> >> about
> >> >> >>>> >>> >>> graphics and one about Atheism
> >> >> >>>> >>> >>> http://people.csail.mit.edu/jrennie/20Newsgroups/)
to
> >> Solr.
> >> >> >>>> However my
> >> >> >>>> >>> >>> LanczosSolver in Mahout.4
does not find any eigenvalues
> >> >> (there are
> >> >> >>>> >>> >>> eigenvectors as you see
in the follow up logs).
> >> >> >>>> >>> >>> The only things I'm doing
different from
> >> >> >>>> >>> >>> (https://github.com/algoriffic/lsa4solr)
is that I'm
> not
> >> >> using the
> >> >> >>>> >>> >>> 'Summary' field but rather
the actual 'text' field in
> Solr.
> >> >> I'm
> >> >> >>>> >>> >>> assuming the issue is that
Summary field already removes
> >> the
> >> >> noise
> >> >> >>>> and
> >> >> >>>> >>> >>> make the clustering work
and the raw index data does
> not do
> >> >> that,
> >> >> >>>> am I
> >> >> >>>> >>> >>> correct or there are other
potential explanations? For
> the
> >> >> desired
> >> >> >>>> >>> >>> rank I'm using values between
10-100 and looking for
> >> #clusters
> >> >> >>>> between
> >> >> >>>> >>> >>> 2-10 (different values for
different trials), but always
> >> the
> >> >> same
> >> >> >>>> >>> >>> result comes out, no clusters
found.
> >> >> >>>> >>> >>> If my issue is related to
not having summarization done,
> >> how
> >> >> can
> >> >> >>>> that
> >> >> >>>> >>> >>> be done in Solr? I wasn't
able to fine a Summary field
> in
> >> >> Solr.
> >> >> >>>> >>> >>>
> >> >> >>>> >>> >>> Thanks
> >> >> >>>> >>> >>> Peyman
> >> >> >>>> >>> >>>
> >> >> >>>> >>> >>>
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Lanczos iteration
complete - now to diagonalize
> the
> >> >> >>>> tri-diagonal
> >> >> >>>> >>> >>> auxiliary matrix.
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 0 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 1 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 2 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 3 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 4 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 5 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 6 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 7 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 8 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 9 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: Eigenvector 10 found
with eigenvalue 0.0
> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20 AM
> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
> >> solve
> >> >> >>>> >>> >>> INFO: LanczosSolver finished.
> >> >> >>>> >>> >>>
> >> >> >>>> >>> >>>
> >> >> >>>> >>> >>> On Sun, Jan 1, 2012 at 10:06
PM, Dmitriy Lyubimov <
> >> >> >>>> dlieu.7@gmail.com>
> >> >> >>>> >>> wrote:
> >> >> >>>> >>> >>>> In Mahout lsa pipeline
is possible with seqdirectory,
> >> >> seq2sparse
> >> >> >>>> and
> >> >> >>>> >>> ssvd
> >> >> >>>> >>> >>>> commands. Nuances are
understanding dictionary format
> and
> >> llr
> >> >> >>>> >>> anaylysis of
> >> >> >>>> >>> >>>> n-grams and perhaps
use a slightly better lemmatizer
> than
> >> the
> >> >> >>>> default
> >> >> >>>> >>> one.
> >> >> >>>> >>> >>>>
> >> >> >>>> >>> >>>> With indexing part you
are on your own at this point.
> >> >> >>>> >>> >>>> On Jan 1, 2012 2:28
PM, "Peyman Mohajerian" <
> >> >> mohajeri@gmail.com>
> >> >> >>>> >>> wrote:
> >> >> >>>> >>> >>>>
> >> >> >>>> >>> >>>>> Hi Guys,
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>> >>>>> I'm interested in
this work:
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>>
> >> >> >>>>
> >> >>
> >>
> http://www.ccri.com/blog/2010/4/2/latent-semantic-analysis-in-solr-using-clojure.html
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>> >>>>> I looked at some
of the comments and notices that
> there
> >> was
> >> >> >>>> interest
> >> >> >>>> >>> >>>>> in incorporating
it into Mahout, back in 2010. I'm
> also
> >> >> having
> >> >> >>>> issues
> >> >> >>>> >>> >>>>> running this code
due to dependencies on older
> version of
> >> >> Mahout.
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>> >>>>> I was wondering
if LSA is now directly available in
> >> Mahout?
> >> >> Also
> >> >> >>>> if I
> >> >> >>>> >>> >>>>> upgrade to the latest
Mahout would this Clojure code
> >> work?
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>> >>>>> Thanks
> >> >> >>>> >>> >>>>> Peyman
> >> >> >>>> >>> >>>>>
> >> >> >>>> >>>
> >> >> >>>>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message