mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Oliveira <>
Subject Re: Multi-relational data
Date Wed, 05 May 2010 13:29:29 GMT

On Wed, May 5, 2010 at 8:41 AM, Sean Owen <> wrote:

> You might have to be more specific. Support this is in the context of
> what, recommendations, clustering, ?

Classification, clustering, and recommendation are the most important ones.

> You can probably fit such concepts into any framework with enough
> cleverness, so in that sense, as a general framework, sure I don't see
> why any algorithm couldn't eventually be applied to such data.
> This is a fairly specific kind of data model, so I am not sure if it
> would be something explicit supported in some special way.

I'm currently working on a system that implements several non-parametric
machine learning techniques to work with multi-relational data (K-Medoids,
KNN, etc), and it works quite nicely with data that fits in memory. However,
I have some new huge datasets, and I'll probably need to use some kind of
parallelization, and Mahout seems a good solution. The main purpose of my
email was to see if there's someone else out there working in the same thing
as I.
>From a quick look at the code, a straightforward solution would be to define
a new type of Vector (it wouldn't be a vector in the mathematical sense,
just a way to save relational information about an instance), and some
DistanceMeasures to work with that vector. Then we could use distance based
techniques, such as canopy clustering and k-means.
Is there any plans to implement more distance-based (or kernel-based)
algorithms, such as SVMs and KNN?


> On Wed, May 5, 2010 at 1:26 PM, Pedro Oliveira <> wrote:
> > Hi,
> >
> > I have a simple question: does Mahout supports, or plans to support,
> > multi-relational datasets?
> > I.e., datasets where each instance can have a variable number of values
> in a
> > attribute, and values can be other instances?
> > The basic example is a social network, where each person has several
> > attributes, and some attributes, like "knows", can have several distinct
> > values, and these values are other persons.
> > This datasets are usually very sparse (there's lots of distinct
> attributes,
> > but each instance only has values for few of them), and the relational
> > information is very relevant (in the social network example, the
> > acquaintances of our acquaintances are relevant).
> >
> >
> > Cheers,
> > Pedro
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message