drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henrik Behrens (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-325) Support for MADlib
Date Fri, 13 Dec 2013 13:38:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847481#comment-13847481

Henrik Behrens commented on DRILL-325:

I strongly support this feature for the following reasons:
•	MADlib already supports a wide range of algorithms for machine learning, data mining and
statistics (see http://doc.madlib.net/latest/ for details)
•	MADlib is free and open source
•	MADlib is designed to eventually serve a role for scalable database systems that is similar
to the CRAN library for R: a community repository of statistical methods, this time written
with scale and parallelism in mind
•	MADlib is open for contributions of both new methods, and ports to additional database
•	MADlib is already supported on the Hadoop platform via HAWQ
•	MADlib has already been started to be ported to Impala (http://blog.cloudera.com/blog/2013/10/how-to-use-madlib-pre-built-analytic-functions-with-impala/)
•	MADlib uses SQL and UDFs/UDAs for implementing analytical functions
•	MADlib supports iterative algorithms (in contrast to SQL)
•	MADlib supports templated Queries (the same function can be applied to different tables,
in contrast to SQL)
•	MADlib contains additional sophisticated features and abstractions (Macroprogramming,
Microprogramming, Abstraction Layer for UDFs, Convex Optimization, Features for Statistical
Text Analysis)

For details please read their excellent paper: http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-38.pdf

I think it is important that no decisions are currently made concerning Drill that would later
make it difficult to port MADlib to Drill (e.g. missing support for iterative or templated
Queries etc.).

> Support for MADlib
> ------------------
>                 Key: DRILL-325
>                 URL: https://issues.apache.org/jira/browse/DRILL-325
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Michael Hausenblas
> It should be possible to use MADlib (http://doc.madlib.net/latest/) with Drill.

This message was sent by Atlassian JIRA

View raw message