mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <>
Subject Re: A hadoop novice meets mahout
Date Fri, 29 May 2009 16:04:56 GMT
The Shashikant code ends up with a SparseVector. There must be some easy
easy way to pull in a SparseVector instead of a DenseVector. The
SparseVector reader wants a DataInput, and the InputMapper has a Text, but
perhaps a quick StringReader is all I need.

The code in the example

On Fri, May 29, 2009 at 12:00 PM, Grant Ingersoll <>wrote:

> I think Shashikant was using a modified form of Mahout that encoded the
> labels in the output.
> I think we're still a little bit away from having a utility that truly
> makes this straightforward to go from text to clusterable vectors.
> No doubt what is happening is the recognition of a need for some type of
> pipeline process that can work with multiple data sources and output various
> consumable formats and help select features.  Unfortunately, we aren't there
> just yet.
> -Grant
> On May 29, 2009, at 11:27 AM, Benson Margulies wrote:
>  I'll fish for a one more hint. I'm using the MAHOUT-126 code to turn text
>> into data via TF-IDF. What comes out of there is not in the same format as
>> your example data. This means that I need a different InputDriver? Is one
>> lying about for the format written by that DocumentVector class?
>> On Fri, May 29, 2009 at 10:29 AM, Jeff Eastman
>> <>wrote:
>>  Benson Margulies wrote:
>>>  OK, I've got some inputs, I want to run k-means, how do I feed the
>>>> beast?
>>>>  Make sure you can run the Synthetic Control example to get everything
>>> wired
>>> together correctly: JDK, Hadoop, Mahout. See
>>> Then write an
>>> input job to convert your data similar to
>>> /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/
>>> and make a new job like
>>> /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/
>>> You will have a small adventure and then be operational.
>>> Have fun,
>>> Jeff

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message