mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Latent Semantic Analysis
Date Mon, 04 Jun 2012 17:44:42 GMT
RE: #2: I'd suggest to read LSA papers (Deerwester's, Dumais, they had
more than one of them) to see how they address efficacy analysis of
LSA there.
SSVD is nothing but an SVD method, Mahout SVD's accuracy analysis is
part of Nathan Halko's dissertation (linked to under "Papers" here:
https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition).

RE:#1: I am not sure i read any work actually trying to figure
clusters on LSA outputs. Which may just mean i didn't read enough on
the topic. There's an eigenspokes value which pretty much is devoted
to sphere-projected clusters produced by SVD on the social data, but i
don't think they included LSA output in any of their claims. However,
you may want to check that paper out. LSA is more about
recall/precision/semantic distance hints (such as context-based
polisemy) rather than topic clustering. However, *i think,* if
there're any eigenspoke "clusters" in the LSA output, they better be
projected on the sphere first in order to detect them more clearly.
(see hyperspherical coordinates). I never did the latter so that's
just my guess. check out the papers for more info.

-d



On Mon, Jun 4, 2012 at 12:11 AM, Peyman Mohajerian <mohajeri@gmail.com> wrote:
> So now that LSA works but clustering of two newsgroups is not accurate
> based on my subjective observation. I had two questions:
> 1) Does it make sense to use Canopy before k-mean step to get a better idea
> of the number of clusters or the output from SSVD can help in that regard?
> Currently I pass the number of clusters as input parameter.
> 2) What is a good way to assess the accuracy of the result, is there some
> data set that is already clustered with certain tuning parameter that I can
> use to gain some confidence? Using Newsgroups of different topics may not
> be the best input since we aren't doing a regular clustering based on word
> count.
>
> Thanks
> Peyman
>
> On Fri, Apr 6, 2012 at 1:05 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
>> Ok, cool.
>>
>> I think writing MR output into your input folder is not a good
>> practice in general in Hadoop world regardless of a job. Glad you had
>> it resolved.
>>
>> On Fri, Apr 6, 2012 at 9:55 AM, Peyman Mohajerian <mohajeri@gmail.com>
>> wrote:
>> > Dmitriy,
>> >
>> > I did downgrade my hadoop and got the same error; however your last
>> > suggestion worked, I moved the output path to a whole different directory
>> > and this particular problem went away.
>> >
>> > Thanks Much,
>> > Peyman
>> >
>> > On Thu, Apr 5, 2012 at 12:38 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> wrote:
>> >
>> >> also i notice that you are using output as a subfolder of your input?
>> >> if so, it is probably going to create some mess. If so, please don't
>> >> use folders for input and output spec which are nested w.r.t. each
>> >> other. This is not expected.
>> >>
>> >> -d
>> >>
>> >> On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian <mohajeri@gmail.com>
>> >> wrote:
>> >> > Ok, great, I'll give these ideas a try later today, the input is the
>> >> > following line(s) that in my code sample was commented out using ';'
>> in
>> >> > Clojure.
>> >> >  The first stage, Q-job is done fine, it is the second job that gets
>> >> messed
>> >> > up, the output of Q-job is at:
>> >> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job and
>> >> > /lsa4solr/matrix/14099700861483/transpose-213/SSVD-out/Q-job but
>> BtJob is
>> >> > looking for the input in the wrong place, it must be hadoop version
as
>> >> you
>> >> > said.
>> >> >
>> >> > input path  #<Path
>> >> > hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120>
>> >> > dd  #<Path[] [Lorg.apache.hadoop.fs.Path;@5563d208>
>> >> > numCol  1000
>> >> > numrow  15982
>> >> >
>> >> >
>> >> > On Thu, Apr 5, 2012 at 11:54 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Another idea i have is to try to run it from just Mahout command
>> line,
>> >> >> see if it works with .205. If it does, it is definitely something
>> >> >> about passing parameters in/client hadoop classpath/ etc.
>> >> >>
>> >> >> On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>> >
>> >> >> wrote:
>> >> >> > also you are printing your input path -- how does it look
like in
>> >> >> > reality? because this path that it complains about,
>> SSVDOutput/data,
>> >> >> > in fact should be the input path. That's what's perplexing.
>> >> >> >
>> >> >> > We are talking hadoop job setup process here, nothing specific
to
>> the
>> >> >> > solution itself. And job setup/directory management fails
for some
>> >> >> > reason.
>> >> >> >
>> >> >> > On Thu, Apr 5, 2012 at 11:45 AM, Dmitriy Lyubimov <
>> dlieu.7@gmail.com>
>> >> >> wrote:
>> >> >> >> Any chance you could test it with its current dependency,
>> 0.20.204?
>> >> or
>> >> >> >> that would be hard to stage?
>> >> >> >>
>> >> >> >> Newer hadoop version is frankly all i can think of here
for the
>> >> reason
>> >> >> of this.
>> >> >> >>
>> >> >> >> On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian <
>> >> mohajeri@gmail.com>
>> >> >> wrote:
>> >> >> >>> Hi Dmitriy,
>> >> >> >>>
>> >> >> >>> It is a Clojure code from:
>> https://github.com/algoriffic/lsa4solr
>> >> >> >>> Of course I modified it to use Mahout .6 distribution,
also
>> running
>> >> on
>> >> >> >>> hadoop-0.20.205.0, here is the Closure code that I
changed,
>> >> >> >>> the lines after ' decomposer (doto (.run ssvdSolver))
' still
>> need
>> >> >> >>> modification b/c I'm not reading the eigenValue/Vector
from the
>> >> solver
>> >> >> >>> correctly.  Originally this code was based on Mahout
.4. I'm
>> >> creating
>> >> >> the
>> >> >> >>> Matrix from Solr 3.1.0, very similar to what was done
on: '
>> >> >> >>> https://github.com/algoriffic/lsa4solr'
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>>
>> >> >> >>> (defn decompose-svd
>> >> >> >>>  [mat k]
>> >> >> >>>  ;(println "input path " (.getRowPath mat))
>> >> >> >>>  ;(println "dd " (into-array [(.getRowPath mat)]))
>> >> >> >>>  ;(println "numCol " (.numCols mat))
>> >> >> >>>  ;(println "numrow " (.numRows mat))
>> >> >> >>>  (let [eigenvalues (new java.util.ArrayList)
>> >> >> >>>    eigenvectors (DenseMatrix. (+ k 2) (.numCols
mat))
>> >> >> >>>    numCol (.numCols mat)
>> >> >> >>>        config (.getConf mat)
>> >> >> >>>    rawPath (.getRowPath mat)
>> >> >> >>>    outputPath (Path. (str (.toString rawPath) "/SSVD-out"))
>> >> >> >>>    inputPath (into-array [rawPath])
>> >> >> >>>    ssvdSolver (SSVDSolver. config inputPath outputPath
1000 k 60
>> 3)
>> >> >> >>>    decomposer (doto (.run ssvdSolver))
>> >> >> >>>    V (normalize-matrix-columns (.viewPart (.transpose
>> eigenvectors)
>> >> >> >>>                           (int-array
[0 0])
>> >> >> >>>                           (int-array
[(.numCols mat) k])))
>> >> >> >>>    U (mmult mat V)
>> >> >> >>>    S (diag (take k (reverse eigenvalues)))]
>> >> >> >>>    {:U U
>> >> >> >>>     :S S
>> >> >> >>>     :V V}))
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Thu, Apr 5, 2012 at 11:10 AM, Dmitriy Lyubimov
<
>> >> dlieu.7@gmail.com>
>> >> >> wrote:
>> >> >> >>>
>> >> >> >>>> Yeah. i don't see how it may have arrived at that
error.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> Peyman,
>> >> >> >>>>
>> >> >> >>>> I need to know more -- it looks like you are using
embedded api,
>> >> not a
>> >> >> >>>> command line, so i need to see how you you initialize
the solver
>> >> and
>> >> >> >>>> also which version of Mahout libraries you are
using (your stack
>> >> trace
>> >> >> >>>> numbers do not correspond to anything reasonable
on current
>> trunk).
>> >> >> >>>>
>> >> >> >>>> thanks.
>> >> >> >>>>
>> >> >> >>>> -d
>> >> >> >>>>
>> >> >> >>>> On Thu, Apr 5, 2012 at 10:55 AM, Dmitriy Lyubimov
<
>> >> dlieu.7@gmail.com>
>> >> >> >>>> wrote:
>> >> >> >>>> > Hm. i never saw that and not sure where this
folder comes
>> from.
>> >> >> Which
>> >> >> >>>> > hadoop version are you using? This may be
a result of
>> >> incompatible
>> >> >> >>>> > support for multiple outputs in the newer
hadoop versions . I
>> >> tested
>> >> >> >>>> > it with CDH3u0/u3 and it was fine. This folder
should normally
>> >> >> appear
>> >> >> >>>> > in the conversation, i suspect it is an internal
hadoop thing.
>> >> >> >>>> >
>> >> >> >>>> > This is without me actually looking at the
code per stack
>> trace.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> > On Thu, Apr 5, 2012 at 5:22 AM, Peyman Mohajerian
<
>> >> >> mohajeri@gmail.com>
>> >> >> >>>> wrote:
>> >> >> >>>> >> Hi Guys,
>> >> >> >>>> >> I'm now using ssvd for my LSA code and
get the following
>> error,
>> >> at
>> >> >> the
>> >> >> >>>> time
>> >> >> >>>> >> of error all I have under 'SSVD-out'
folder:
>> >> >> >>>> >> Q-job/QHat-m-00000<
>> >> >> >>>>
>> >> >>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FQHat-m-00000&namenodeInfoPort=50070
>> >> >> >>>> >&
>> >> >> >>>> >> R-m-00000<
>> >> >> >>>>
>> >> >>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2FR-m-00000&namenodeInfoPort=50070
>> >> >> >>>> >&
>> >> >> >>>> >> _SUCCESS<
>> >> >> >>>>
>> >> >>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2F_SUCCESS&namenodeInfoPort=50070
>> >> >> >>>> >&
>> >> >> >>>> >> part-m-00000.deflate<
>> >> >> >>>>
>> >> >>
>> >>
>> http://localhost:50075/browseDirectory.jsp?dir=%2Flsa4solr%2Fmatrix%2F14099700861483%2Ftranspose-213%2FSSVD-out%2FQ-job%2Fpart-m-00000.deflate&namenodeInfoPort=50070
>> >> >> >>>> >
>> >> >> >>>> >>
>> >> >> >>>> >> I'm not clear where '/data' folder is
supposed to be set, is
>> it
>> >> >> part of
>> >> >> >>>> the
>> >> >> >>>> >> output of the QJob, I don't see any error
in the QJob*?
>> >> >> >>>> >>
>> >> >> >>>> >> *Thanks,*
>> >> >> >>>> >> *
>> >> >> >>>> >> SEVERE: java.io.FileNotFoundException:
File does not exist:
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> hdfs://localhost:9000/lsa4solr/matrix/15835804941333/transpose-120/SSVD-out/data
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:534)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>> >> >> >>>> >>    at
>> >> >> >>>>
>> >> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:954)
>> >> >> >>>> >>    at
>> >> >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:971)
>> >> >> >>>> >>    at
>> >> >> org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
>> >> >> >>>> >>    at
>> >> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
>> >> >> >>>> >>    at
>> >> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)
>> >> >> >>>> >>    at java.security.AccessController.doPrivileged(Native
>> Method)
>> >> >> >>>> >>    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >>
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)
>> >> >> >>>> >>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
>> >> >> >>>> >>    at
>> >> >> >>>>
>> >> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:505)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:347)
>> >> >> >>>> >>    at
>> >> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:188)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> lsa4solr.clustering_protocol$decompose_term_doc_matrix.invoke(clustering_protocol.clj:125)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> lsa4solr.clustering_protocol$cluster_kmeans_docs.invoke(clustering_protocol.clj:142)
>> >> >> >>>> >>    at
>> lsa4solr.cluster$cluster_dispatch.invoke(cluster.clj:72)
>> >> >> >>>> >>    at lsa4solr.cluster$_cluster.invoke(cluster.clj:103)
>> >> >> >>>> >>    at lsa4solr.cluster.LSAClusteringEngine.cluster(Unknown
>> >> Source)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>> >> >> >>>> >>    at
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >>
>> >>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> >> >> >>>> >>    at
>> >> >> >>>> >>
>> >> >>
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> >> >> >>>> >>
>> >> >> >>>> >> On Sun, Feb 26, 2012 at 4:56 PM, Dmitriy
Lyubimov <
>> >> >> dlieu.7@gmail.com>
>> >> >> >>>> wrote:
>> >> >> >>>> >>
>> >> >> >>>> >>> for the third time, in context of
lsa, faster and hence
>> perhaps
>> >> >> better
>> >> >> >>>> >>> alternative to lanczos is ssvd. Is
there any specific reason
>> >> you
>> >> >> want
>> >> >> >>>> >>> to use lanczos solver in context
of LSA?
>> >> >> >>>> >>>
>> >> >> >>>> >>> -d
>> >> >> >>>> >>>
>> >> >> >>>> >>> On Sun, Feb 26, 2012 at 6:40 AM,
Peyman Mohajerian <
>> >> >> mohajeri@gmail.com
>> >> >> >>>> >
>> >> >> >>>> >>> wrote:
>> >> >> >>>> >>> > Hi Guys,
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Per you advice I did upgrade
to Mahout .6 and did a bunch
>> of
>> >> API
>> >> >> >>>> >>> > changes and in the meantime
realized I had a bug with my
>> >> input
>> >> >> >>>> matrix,
>> >> >> >>>> >>> > zero rows read from Solr b/c
multiple fields in Solr were
>> >> index
>> >> >> and
>> >> >> >>>> >>> > not just the one I was interested
in, that issues is fixed
>> >> and
>> >> >> I have
>> >> >> >>>> >>> > a matrix with these dimensions:
(.numCols mat) 1000
>> (.numRows
>> >> >> mat)
>> >> >> >>>> >>> > 15932 (or the transpose)
>> >> >> >>>> >>> > Unfortunately I'm getting the
below error now, in the
>> context
>> >> >> of some
>> >> >> >>>> >>> > other Mahout algorithm there
was a mention of '/tmp' vs
>> >> '/_tmp'
>> >> >> >>>> >>> > causing this issue but in this
particular case the matrix
>> is
>> >> in
>> >> >> >>>> >>> > memory!! I'm using this google
package: guava-r09.jar
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > SEVERE: java.util.NoSuchElementException
>> >> >> >>>> >>> >        at
>> >> >> >>>> >>>
>> >> >> >>>>
>> >> >>
>> >>
>> com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
>> >> >> >>>> >>> >        at
>> >> >> >>>> >>>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
>> >> >> >>>> >>> >        at
>> >> >> >>>> >>>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
>> >> >> >>>> >>> >        at
>> >> >> >>>> >>>
>> >> >> >>>>
>> >> >>
>> >>
>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
>> >> >> >>>> >>> >        at
>> >> >> >>>> >>>
>> >> lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165)
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Any suggestion?
>> >> >> >>>> >>> > Thanks,
>> >> >> >>>> >>> > Peyman
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > On Mon, Feb 20, 2012 at 10:38
AM, Dmitriy Lyubimov <
>> >> >> >>>> dlieu.7@gmail.com>
>> >> >> >>>> >>> wrote:
>> >> >> >>>> >>> >> Peyman,
>> >> >> >>>> >>> >>
>> >> >> >>>> >>> >>
>> >> >> >>>> >>> >> Yes, what Ted said. Please
take 0.6 release. Also try
>> ssvd,
>> >> it
>> >> >> may
>> >> >> >>>> >>> >> benefit you in some regards
compared to Lanczos.
>> >> >> >>>> >>> >>
>> >> >> >>>> >>> >> -d
>> >> >> >>>> >>> >>
>> >> >> >>>> >>> >> On Sun, Feb 19, 2012 at
10:34 AM, Peyman Mohajerian <
>> >> >> >>>> mohajeri@gmail.com>
>> >> >> >>>> >>> wrote:
>> >> >> >>>> >>> >>> Hi Dmitriy & Others,
>> >> >> >>>> >>> >>>
>> >> >> >>>> >>> >>> Dmitriy thanks for your
previous response.
>> >> >> >>>> >>> >>> I have a follow up question
to my LSA project. I have
>> >> managed
>> >> >> to
>> >> >> >>>> >>> >>> upload 1,500 documents
from two different news groups
>> (one
>> >> >> about
>> >> >> >>>> >>> >>> graphics and one about
Atheism
>> >> >> >>>> >>> >>> http://people.csail.mit.edu/jrennie/20Newsgroups/)
to
>> >> Solr.
>> >> >> >>>> However my
>> >> >> >>>> >>> >>> LanczosSolver in Mahout.4
does not find any eigenvalues
>> >> >> (there are
>> >> >> >>>> >>> >>> eigenvectors as you
see in the follow up logs).
>> >> >> >>>> >>> >>> The only things I'm
doing different from
>> >> >> >>>> >>> >>> (https://github.com/algoriffic/lsa4solr)
is that I'm
>> not
>> >> >> using the
>> >> >> >>>> >>> >>> 'Summary' field but
rather the actual 'text' field in
>> Solr.
>> >> >> I'm
>> >> >> >>>> >>> >>> assuming the issue is
that Summary field already removes
>> >> the
>> >> >> noise
>> >> >> >>>> and
>> >> >> >>>> >>> >>> make the clustering
work and the raw index data does
>> not do
>> >> >> that,
>> >> >> >>>> am I
>> >> >> >>>> >>> >>> correct or there are
other potential explanations? For
>> the
>> >> >> desired
>> >> >> >>>> >>> >>> rank I'm using values
between 10-100 and looking for
>> >> #clusters
>> >> >> >>>> between
>> >> >> >>>> >>> >>> 2-10 (different values
for different trials), but always
>> >> the
>> >> >> same
>> >> >> >>>> >>> >>> result comes out, no
clusters found.
>> >> >> >>>> >>> >>> If my issue is related
to not having summarization done,
>> >> how
>> >> >> can
>> >> >> >>>> that
>> >> >> >>>> >>> >>> be done in Solr? I wasn't
able to fine a Summary field
>> in
>> >> >> Solr.
>> >> >> >>>> >>> >>>
>> >> >> >>>> >>> >>> Thanks
>> >> >> >>>> >>> >>> Peyman
>> >> >> >>>> >>> >>>
>> >> >> >>>> >>> >>>
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Lanczos iteration
complete - now to diagonalize
>> the
>> >> >> >>>> tri-diagonal
>> >> >> >>>> >>> >>> auxiliary matrix.
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 0
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 1
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 2
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 3
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 4
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 5
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 6
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 7
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 8
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 9
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: Eigenvector 10
found with eigenvalue 0.0
>> >> >> >>>> >>> >>> Feb 19, 2012 3:25:20
AM
>> >> >> >>>> >>> >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver
>> >> solve
>> >> >> >>>> >>> >>> INFO: LanczosSolver
finished.
>> >> >> >>>> >>> >>>
>> >> >> >>>> >>> >>>
>> >> >> >>>> >>> >>> On Sun, Jan 1, 2012
at 10:06 PM, Dmitriy Lyubimov <
>> >> >> >>>> dlieu.7@gmail.com>
>> >> >> >>>> >>> wrote:
>> >> >> >>>> >>> >>>> In Mahout lsa pipeline
is possible with seqdirectory,
>> >> >> seq2sparse
>> >> >> >>>> and
>> >> >> >>>> >>> ssvd
>> >> >> >>>> >>> >>>> commands. Nuances
are understanding dictionary format
>> and
>> >> llr
>> >> >> >>>> >>> anaylysis of
>> >> >> >>>> >>> >>>> n-grams and perhaps
use a slightly better lemmatizer
>> than
>> >> the
>> >> >> >>>> default
>> >> >> >>>> >>> one.
>> >> >> >>>> >>> >>>>
>> >> >> >>>> >>> >>>> With indexing part
you are on your own at this point.
>> >> >> >>>> >>> >>>> On Jan 1, 2012 2:28
PM, "Peyman Mohajerian" <
>> >> >> mohajeri@gmail.com>
>> >> >> >>>> >>> wrote:
>> >> >> >>>> >>> >>>>
>> >> >> >>>> >>> >>>>> Hi Guys,
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>> >>>>> I'm interested
in this work:
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>>
>> >> >> >>>>
>> >> >>
>> >>
>> http://www.ccri.com/blog/2010/4/2/latent-semantic-analysis-in-solr-using-clojure.html
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>> >>>>> I looked at
some of the comments and notices that
>> there
>> >> was
>> >> >> >>>> interest
>> >> >> >>>> >>> >>>>> in incorporating
it into Mahout, back in 2010. I'm
>> also
>> >> >> having
>> >> >> >>>> issues
>> >> >> >>>> >>> >>>>> running this
code due to dependencies on older
>> version of
>> >> >> Mahout.
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>> >>>>> I was wondering
if LSA is now directly available in
>> >> Mahout?
>> >> >> Also
>> >> >> >>>> if I
>> >> >> >>>> >>> >>>>> upgrade to the
latest Mahout would this Clojure code
>> >> work?
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>> >>>>> Thanks
>> >> >> >>>> >>> >>>>> Peyman
>> >> >> >>>> >>> >>>>>
>> >> >> >>>> >>>
>> >> >> >>>>
>> >> >>
>> >>
>>

Mime
View raw message