mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: M/R Job for Log file to FPG
Date Wed, 02 Jun 2010 12:54:11 GMT
In thinking more about this, it seems that it would be even better to just incorporate some
of the ideas of this into the DocumentProcessor, except I think it is useful to not have to
go to SeqFile first.  Also, it might be worth grabbing Solr's FilterFactory stuff for configuring
the Lucene analyzers.  Not sure how easy that would be to do.

-Grant
On May 28, 2010, at 4:53 PM, Grant Ingersoll wrote:

> OK, I posted a draft patch of this.  Would appreciate a review.  I think it's even the
case that one could slip Groovy into it (or whatever) through the proper implementation of
one interface.  Feedback welcome on M-403.
> 
> 
> On May 28, 2010, at 10:05 AM, Grant Ingersoll wrote:
> 
>> https://issues.apache.org/jira/browse/MAHOUT-403
>> 
>> On May 28, 2010, at 8:58 AM, Grant Ingersoll wrote:
>> 
>>> 
>>> On May 27, 2010, at 7:06 PM, Ted Dunning wrote:
>>> 
>>>> That should be a small change (and helpful for a lot of mining tasks).
>>>> 
>>>> But once you jump on that slippery slope, why not allow a tiny Groovy
>>>> closure to be injected?  Or to pass in an object that will extract a map
of
>>>> values from each line?
>>> 
>>> Expanding on this, I think we could do the following:
>>> 
>>> Map capturing groups to labels, then have pluggable output so that one could
easily output to FPG, Classifiers, etc.
>>> 
>>> I'm not all that familiar w/ Groovy, so I'll put up my variation and then people
can expand on it.
>>> 
>>> -Grant
>> 
>> 
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Mime
View raw message