hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: Stack Overflow?
Date Wed, 02 Mar 2011 09:05:33 GMT
You could try using Apache Mahout to at least cluster the messages into groups of similar ones
based on text features. That should be doable. Given the groups, you could manually extract
questions (the clusters with most threads could be the most frequently asked). Also, if you
manage to get this to work nicely, it could be a nice tool for other projects as well. Would
be a fun exercise anyways...

I am starting to toy with Mahout for another pet project. Once I get more comfortable with
it, I might be able to take this on (not a promise).

I think automatic question extraction is a quite ambitious goal.

Friso



On 1 mrt 2011, at 19:12, Stack wrote:

> On Tue, Mar 1, 2011 at 10:03 AM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com> wrote:
>>> Do you have  something in mind?  Could we be making better use of the
>>> sematext  summaries?
>> 
>> Hm... we already index HBase and other Digests on search-hadoop.com.
>> I was thinking more along the lines of mining the ML archives and doing
>> automatic Q&A extraction.
>> I don't know how difficult it would be.  Maybe the input would be too noisy
>> (people don't ask proper questions, answers are not full sentences, quote
>> characters prefixing lines from old messages add a layer of complexity...), but
>> that's what I thought you might have meant.
>> 
> 
> That'd be a nice addition to the docs.  Our FAQ is in need of
> updating.  This would be a nice undertaking if someone was up for
> taking it on.
> St.Ack


Mime
View raw message