mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Text Classification using Mahout
Date Wed, 29 Sep 2010 14:32:24 GMT

On Sep 28, 2010, at 2:54 PM, Ted Dunning wrote:

> Neil,
> That example should be updated to the current trunk version of the software.  That isn't
likely to happen right away, so you should
> adapt the procedures.
> On Tue, Sep 28, 2010 at 10:49 AM, Neil Ghosh <> wrote:
> Hi Grant ,
> I am trying to run the classification example in
> doing the step 3. ant install
> We don't use ant any more.

I used Ant to build/run the examples.  The examples came w/ Mahout already built, so no need
for Maven for the examples.

> You should use 'mvn install' here instead.  Make sure you have checked out the trunk
version of the software.
> However it is trying to download the 2GB file , I might run out of space in
> my linux partition , also download may be disturbed in my connection .
> Yes.  These could happen.  IF this is a problem, you might want to invest a tiny amount
of money to rent an EC2 machine for a few hours.  This literally will be less than a dollar,
even if you have to go through the process several times.

Yes, it is going to get the Wikipedia data set.  It expands to about 10GB, if I recall.

> Yes
> is there any way I can test the example in a smaller set of wikipedia data
> or download the data offline ?
> Sure.  Try the 20newsgroups examples.

Yep, the principals here are the same.  For the wikipedia, all I did was classify into Democrats
and Republicans, but the underlying process really is no different.

> Also, you can download the wikipedia test data any way  you like.  

Grant Ingersoll Apache Lucene/Solr Conference, Boston Oct 7-8

View raw message