mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Samsara's learning curve
Date Wed, 29 Mar 2017 16:37:23 GMT
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel <> wrote:

> The other missing bit is dataframes. R and Spark have them in different
> forms but Mahout largely ignores the issue of real world object ids.

Mahout only supports matrices and vectors, not data frames.

Data frames imply mix of various types of data which yet to be converted to
numerical data to be consumed by algebraic algorithm (in R, usually done
via formula). Unfortunately Mahout has no extension for formula. As for
data frames, usually native data frames (e.g., spark data frames
specifically) work reasonably well for vectorization of non-numerical data.

distributed matrices are indeed do not support column labels, and row
labels are quasi-supported, meaning they share label nature with unordered
row index for transposition purposes, i.e., one can either have row labels
and limited transposition semantics, or one can have integer labels
interpreted as column index for transposition purposes, but not both.

another way is to use mahout NamedVectors for the purposes of row labeling,
but this is not supported consistently in any given elementary solver.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message