mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chirag Lakhani <clakh...@zaloni.com>
Subject Re: database support for clustering
Date Tue, 25 Jun 2013 13:28:54 GMT
Just to clarify the UDF would convert the data into a dense or sparse
vector format?


On Mon, Jun 24, 2013 at 12:55 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Better would be to build a Hive UDF that vectorizes your data directly from
> the Hive table and produces a sequence file with vectors ready to cluster.
>  Then use the streaming k-means stuff.
>
>
>
> On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <clakhani@zaloni.com>
> wrote:
>
> > What data base interfaces are there for Mahout?  The website mentions
> > MongoDB and Cassandra.  I get the feeling these are for recommender
> systems
> > only.  Are there any database that Mahout can interface directly in order
> > to perform clustering?
> >
> > I am thinking of an example where I have a large table in Hive of
> customer
> > data and I want to do customer segmentation.  Normally I make a CSV file
> of
> > this data and then manually import it into some Java code.  Is there a
> > better way of doing that?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message