spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemant Bhanawat <hemant9...@gmail.com>
Subject Re: mllib + SQL
Date Sat, 01 Sep 2018 15:34:47 GMT
SQL in addition to simplicity also provides standard way of analysis across
multiple databases. That aspect is something that users would like with
machine learning as well.

Flexibility of Spark's API is definitely helpful but a simple and standard
way for new users is desired when it comes to machine learning.

IMO, SQL on ML should come as an incremental addition to Spark's
capabilities.


On Fri, Aug 31, 2018, 7:14 PM Sean Owen <srowen@gmail.com> wrote:

> My $0.02 -- this isn't worthwhile.
>
> Yes, there are ML-in-SQL tools. I'm thinking of MADlib for example. I
> think these hold over from days when someone's only interface to a data
> warehouse was SQL, and so there had to be SQL-language support for invoking
> ML jobs. There was no programmatic alternative.
>
> There's nothing particularly helpful about SQL as a language for
> expressing this, versus simply writing operations in a high-level
> programming language.
>
> Spark is that programmatic paradigm, and offers a more general way to
> express ETL, ML and SQL within their own appropriate DSLs. There's no need
> to also shoehorn Spark ML into Spark SQL.
>
> I also think there's a bit of false abstraction here. The nice thing about
> SQL-only access to these functions is it sounds much simpler, and
> accessible to people that only know SQL and nothing about Python or JVMs.
> In practice, using Spark means having some basic awareness of its
> distributed execution environment. SQL-only analysts would struggle to be
> effective with SQL-only access to Spark.
>
> On Fri, Aug 31, 2018 at 5:05 AM Hemant Bhanawat <hemant9379@gmail.com>
> wrote:
>
>> We allow our users to interact with spark cluster using SQL queries only.
>> That's easy for them. MLLib does not have SQL extensions and we cannot
>> expose it to our users.
>>
>> SQL extensions can further accelerate MLLib's adoption. See
>> https://cloud.google.com/bigquery/docs/bigqueryml-intro.
>>
>> Hemant
>>
>>
>> On Thu, Aug 30, 2018 at 9:41 PM William Benton <willb@redhat.com> wrote:
>>
>>> What are you interested in accomplishing?
>>>
>>> The spark.ml package has provided a machine learning API based on
>>> DataFrames for quite some time.  If you are interested in mixing query
>>> processing and machine learning, this is certainly the best place to start.
>>>
>>> See here:  https://spark.apache.org/docs/latest/ml-guide.html
>>>
>>>
>>> best,
>>> wb
>>>
>>>
>>>
>>> On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat <hemant9379@gmail.com>
>>> wrote:
>>>
>>>> Is there a plan to support SQL extensions for mllib? Or is there an
>>>> effort already underway?
>>>>
>>>> Any information is appreciated.
>>>>
>>>> Thanks in advance.
>>>> Hemant
>>>>
>>>

Mime
View raw message