mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Document summarization with lsa - blog post series
Date Mon, 17 Sep 2012 18:43:51 GMT
Very nice post. Thanks.

I wonder if another problem that could benefit from the same approach is finding Cluster names.
Image finding the most important sentence of the cluster instead of for a single doc using
the same methods (break docs into sentences etc). Then use parts of speech to condense to
noun+verb or noun phrase for a candidate cluster name. Or just use the most important sentence
as is.

Also the most important few sentences might be a reasonable cluster summary.

Using the top terms from the centroid doesn't produce very satisfactory names and though the
term cloud can be a somewhat useful cluster summary it's much harder to comprehend than a
few sentences. One question would be if choosing sentences from different docs for the cluster
summary might produce gibberish.

On Sep 6, 2012, at 1:47 PM, Lance Norskog <> wrote:

I stole the SVD code from Mahout, ported OpenNLP to Solr, wrote a
document summarizer, and benchmarked it all:

Document Summarization with LSA: Threat? Or Menace?

Please critique- what did I completely miss, in the posts or the research?

Lance Norskog

View raw message