mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vineet yadav <vineet.yadav.i...@gmail.com>
Subject Re: Regarding classification of URL's
Date Tue, 01 Mar 2011 11:57:26 GMT
Hi Arjun,
you need to scrap content from website for a given url, and then need
to prepare training datasets from scarped content  for  Bayesian
classification.
Also check out mahout twenty news groups example for reference
https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
Thanks
Vineet Yadav

On Tue, Mar 1, 2011 at 5:05 PM, Arjun Kumar Reddy
<charjunkumar.reddy@iiitb.net> wrote:
> Hi list,
>
> I am a newbie in mahout and I want to now some details regarding this
> project.
>
> I am in need of a classification tool which gives me the category in which
> the URL or content belongs to.
>
> For example, If I give this particular URL's
>
> http://www.espncricinfo.com/icc_cricket_worldcup2011/content/current/player/49764.htmlit
> should give me the category as "cricket".
>
> I was able to do this with other existing API's like alchemy, evri, textwise
> etc. and I am looking for something better in terms of performance.
>
> Could anyone please help me how can I use this mahout tool for classifying
> the documents.
>
>
> Thanks and regards,*
> *Ch. Arjun Kumar Reddy,
> International Institute of Information Technology – Bangalore (IIITB),
> 26/C, Electronics City, Hosur Road,
> Bangalore 560 100
> Ph: 8800710999*
> *
>

Mime
View raw message