mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Text Classification using Mahout
Date Wed, 29 Sep 2010 14:32:24 GMT

On Sep 28, 2010, at 2:54 PM, Ted Dunning wrote:

> Neil,
> 
> That example should be updated to the current trunk version of the software.  That isn't
likely to happen right away, so you should
> adapt the procedures.
> 
> On Tue, Sep 28, 2010 at 10:49 AM, Neil Ghosh <neil.ghosh@gmail.com> wrote:
> Hi Grant ,
> 
> I am trying to run the classification example in
> 
> http://www.ibm.com/developerworks/java/library/j-mahout/
> 
> doing the step 3. ant install
> 
> We don't use ant any more.

I used Ant to build/run the examples.  The examples came w/ Mahout already built, so no need
for Maven for the examples.

> 
> You should use 'mvn install' here instead.  Make sure you have checked out the trunk
version of the software.
>  
> 
> However it is trying to download the 2GB file , I might run out of space in
> my linux partition , also download may be disturbed in my connection .
> 
> Yes.  These could happen.  IF this is a problem, you might want to invest a tiny amount
of money to rent an EC2 machine for a few hours.  This literally will be less than a dollar,
even if you have to go through the process several times.

Yes, it is going to get the Wikipedia data set.  It expands to about 10GB, if I recall.

> 
> Yes
> is there any way I can test the example in a smaller set of wikipedia data
> or download the data offline ?
> 
> Sure.  Try the 20newsgroups examples.

Yep, the principals here are the same.  For the wikipedia, all I did was classify into Democrats
and Republicans, but the underlying process really is no different.

> 
> Also, you can download the wikipedia test data any way  you like.  

--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8


Mime
View raw message