From user-return-63817-apmail-spark-user-archive=spark.apache.org@spark.apache.org Sat Oct 8 18:22:00 2016 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B6A0B1946D for ; Sat, 8 Oct 2016 18:22:00 +0000 (UTC) Received: (qmail 4970 invoked by uid 500); 8 Oct 2016 18:21:55 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 4838 invoked by uid 500); 8 Oct 2016 18:21:55 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 4828 invoked by uid 99); 8 Oct 2016 18:21:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Oct 2016 18:21:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1A992C08A4 for ; Sat, 8 Oct 2016 18:21:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.629 X-Spam-Level: *** X-Spam-Status: No, score=3.629 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ypAVIm-jg_2T for ; Sat, 8 Oct 2016 18:21:53 +0000 (UTC) Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id CDB315F19B for ; Sat, 8 Oct 2016 18:21:52 +0000 (UTC) Received: by mail-pa0-f41.google.com with SMTP id qn10so31043061pac.2 for ; Sat, 08 Oct 2016 11:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=O+r2fJtzREh40kTCslR4VjjedxeiR4kYg9tBZqWrpMs=; b=yM/s9/MLcYHCSMb5JQPVLS768wIvBVjvKOZJCVkSXDVZ5LCU47wDOqjmZjauPD9JME KFRiZJv6Iu+KDUJL+R5BFHhpiTUIsCas4/vwjG07M/IPgMZEZKma/VDUgaXvz1t83+ql NNCr3+uW8bn2uDrcY1uu5rYlwJ/1QWGoQZSs6EvvvzxMRVKU1dghiHvaRDcLP3f7msSK aWk17hJ/qmIAoiQRV/AawGTresVxSFnOtaDEkS4vyHcMFLFIAvw6OZeFX3m3pFfuw01J szIUegTX7wCe3YXgg1sobShPgTuCndXOoIF2Gr8Fzk67Ito3My8fifPX5Y0Bxfb1hYa5 zVkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=O+r2fJtzREh40kTCslR4VjjedxeiR4kYg9tBZqWrpMs=; b=IBWxrirumsp2/USEFoaM+R1tGqpWlwJvqcLNedAB4/Lrim772OBFz9bRLlD+WXarqT 30GVxIB2TMBlWbKnxgAesuPEkfsdxCs+X/Qv6AE07F/dtx8SrqAtJ6iG0lKcC1c63HLU Aoe4C66VcPRFBb8V8pe6q+l4G/OKuAUEMgTljUTyhA1MLA8SjvDxdBktAO88vxF8M1+1 ESeqxqlXnA3+K715rawrPUuSXOBiBYcABoOTF+1Nn7l+OkUw9EXLiISxtPBE70LSXzbv uOyEURHIabXs38aN5YF6bpnRypCSQHiiSfUeKnr9FxIvUvTfeTPLDGlta5D1fdgbIZZs KJGw== X-Gm-Message-State: AA6/9Rluly4ntif+EPFKezu2SX3h9eWtgMskpGlyju9dgdqJ8qWkBq2cHQa6nQW/GNwo9g== X-Received: by 10.66.232.106 with SMTP id tn10mr41371356pac.17.1475950904740; Sat, 08 Oct 2016 11:21:44 -0700 (PDT) Received: from ?IPv6:2605:e000:151e:6c:c5a7:6d04:9f92:fe0b? ([2605:e000:151e:6c:c5a7:6d04:9f92:fe0b]) by smtp.gmail.com with ESMTPSA id g63sm22831749pfk.58.2016.10.08.11.21.43 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 08 Oct 2016 11:21:44 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_32259CB9-34FD-44F6-883F-7002B47EB7F9" Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Spark SQL Thriftserver with HBase From: Benjamin Kim In-Reply-To: Date: Sat, 8 Oct 2016 11:21:43 -0700 Cc: "user@spark.apache.org" Message-Id: <3802D288-F627-4903-9774-415E45D33083@gmail.com> References: <2943156C-0255-4BF0-8D70-0537CBA0E632@gmail.com> To: Felix Cheung X-Mailer: Apple Mail (2.3124) --Apple-Mail=_32259CB9-34FD-44F6-883F-7002B47EB7F9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Yes. I tried that with the hbase-spark package, but it didn=E2=80=99t = work. We were hoping it would. If it did, we would be using it for = everything from Ad Servers to REST Endpoints and even Reporting Servers. = I guess we will have to wait until they fix it. > On Oct 8, 2016, at 11:05 AM, Felix Cheung = wrote: >=20 > Great, then I think those packages as Spark data source should allow = you to do exactly that (replace org.apache.spark.sql.jdbc with HBASE = one) >=20 > I do think it will be great to get more examples around this though. = Would be great if you could share your experience with this! >=20 >=20 > _____________________________ > From: Benjamin Kim > > Sent: Saturday, October 8, 2016 11:00 AM > Subject: Re: Spark SQL Thriftserver with HBase > To: Felix Cheung > > Cc: > >=20 >=20 > Felix, >=20 > My goal is to use Spark SQL JDBC Thriftserver to access HBase tables = using just SQL. I have been able to CREATE tables using this statement = below in the past: >=20 > CREATE TABLE > USING org.apache.spark.sql.jdbc > OPTIONS ( > url = "jdbc:postgresql://:/dm?user=3D&password=3D", > dbtable "dim.dimension_acamp" > ); >=20 > After doing this, I can access the PostgreSQL table using Spark SQL = JDBC Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I = want to do the same with HBase tables. We tried this using Hive and = HiveServer2, but the response times are just too long. >=20 > Thanks, > Ben >=20 >=20 > On Oct 8, 2016, at 10:53 AM, Felix Cheung > wrote: >=20 > Ben, >=20 > I'm not sure I'm following completely. >=20 > Is your goal to use Spark to create or access tables in HBASE? If so = the link below and several packages out there support that by having a = HBASE data source for Spark. There are some examples on how the Spark = code look like in that link as well. On that note, you should also be = able to use the HBASE data source from pure SQL (Spark SQL) query as = well, which should work in the case with the Spark SQL JDBC Thrift = Server (with = USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_s= ql_10 = ). >=20 >=20 > _____________________________ > From: Benjamin Kim > > Sent: Saturday, October 8, 2016 10:40 AM > Subject: Re: Spark SQL Thriftserver with HBase > To: Felix Cheung > > Cc: > >=20 >=20 > Felix, >=20 > The only alternative way is to create a stored procedure (udf) in = database terms that would run Spark scala code underneath. In this way, = I can use Spark SQL JDBC Thriftserver to execute it using SQL code = passing the key, values I want to UPSERT. I wonder if this is possible = since I cannot CREATE a wrapper table on top of a HBase table in Spark = SQL? >=20 > What do you think? Is this the right approach? >=20 > Thanks, > Ben >=20 > On Oct 8, 2016, at 10:33 AM, Felix Cheung > wrote: >=20 > HBase has released support for Spark > hbase.apache.org/book.html#spark = >=20 > And if you search you should find several alternative approaches. >=20 >=20 >=20 >=20 >=20 > On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" = > wrote: >=20 > Does anyone know if Spark can work with HBase tables using Spark SQL? = I know in Hive we are able to create tables on top of an underlying = HBase table that can be accessed using MapReduce jobs. Can the same be = done using HiveContext or SQLContext? We are trying to setup a way to = GET and POST data to and from the HBase table using the Spark SQL JDBC = thriftserver from our RESTful API endpoints and/or HTTP web farms. If we = can get this to work, then we can load balance the thriftservers. In = addition, this will benefit us in giving us a way to abstract the data = storage layer away from the presentation layer code. There is a chance = that we will swap out the data storage technology in the future. We are = currently experimenting with Kudu. >=20 > Thanks, > Ben > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscribe@spark.apache.org = >=20 >=20 >=20 >=20 >=20 --Apple-Mail=_32259CB9-34FD-44F6-883F-7002B47EB7F9 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Yes. I tried that with the hbase-spark package, but it = didn=E2=80=99t work. We were hoping it would. If it did, we would be = using it for everything from Ad Servers to REST Endpoints and even = Reporting Servers. I guess we will have to wait until they fix it.


On Oct 8, 2016, at 11:05 AM, = Felix Cheung <felixcheung_m@hotmail.com> wrote:

Great, then I think those packages as Spark data source = should allow you to do exactly that (replace org.apache.spark.sql.jdbc = with HBASE one)

I do think it will be great to get more examples around = this though. Would be great if you could share your experience with = this!


_____________________________
From: Benjamin Kim <bbuild11@gmail.com>
Sent: Saturday, October 8, 2016 11:00 AM
Subject: Re: Spark SQL Thriftserver with HBase
To: Felix Cheung <felixcheung_m@hotmail.com>
Cc: <user@spark.apache.org>


Felix,

My goal is to use Spark SQL JDBC Thriftserver to access = HBase tables using just SQL. I have been able to CREATE tables using = this statement below in the past:

CREATE TABLE <table-name>
USING org.apache.spark.sql.jdbc
OPTIONS (
  url = "jdbc:postgresql://<hostname>:<port>/dm?user=3D<username>= ;&password=3D<password>",
  dbtable "dim.dimension_acamp"
);

After doing this, I can access the PostgreSQL table = using Spark SQL JDBC Thriftserver using SQL statements (SELECT, UPDATE, = INSERT, etc.). I want to do the same with HBase tables. We tried this = using Hive and HiveServer2, but the response times are just too long.

Thanks,
Ben


On Oct 8, 2016, at 10:53 AM, Felix Cheung <felixcheung_m@hotmail.com> wrote:

Ben,

I'm not sure I'm following completely.

Is your goal to use Spark to create or access tables in = HBASE? If so the link below and several packages out there support that = by having a HBASE data source for Spark. There are some examples on how = the Spark code look like in that link as well. On that note, you should also be able to use the HBASE data source from = pure SQL (Spark SQL) query as well, which should work in the case with = the Spark SQL JDBC Thrift Server (with USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#= tab_sql_10).


_____________________________
From: Benjamin Kim <bbuild11@gmail.com>
Sent: Saturday, October 8, 2016 10:40 AM
Subject: Re: Spark SQL Thriftserver with HBase
To: Felix Cheung <felixcheung_m@hotmail.com>
Cc: <user@spark.apache.org>


Felix,

The only alternative way is to create a stored procedure = (udf) in database terms that would run Spark scala code underneath. In = this way, I can use Spark SQL JDBC Thriftserver to execute it using SQL = code passing the key, values I want to UPSERT. I wonder if this is possible since I cannot CREATE a wrapper table on = top of a HBase table in Spark SQL?

What do you think? Is this the right approach?

Thanks,
Ben

On Oct 8, 2016, at 10:33 AM, Felix Cheung <felixcheung_m@hotmail.com> wrote:

HBase has released support for Spark

And if you search you should find several alternative = approaches.





On Fri, Oct 7, 2016 at 7:56 AM -0700, = "Benjamin Kim" <bbuild11@gmail.com> wrote:

Does anyone know if Spark can work with HBase = tables using Spark SQL? I know in Hive we are able to create tables on = top of an underlying HBase table that can be accessed using MapReduce = jobs. Can the same be done using HiveContext or SQLContext? We are trying to setup a way to GET and POST data to and = from the HBase table using the Spark SQL JDBC thriftserver from our = RESTful API endpoints and/or HTTP web farms. If we can get this to work, = then we can load balance the thriftservers. In addition, this will benefit us in giving us a way to abstract the data = storage layer away from the presentation layer code. There is a chance = that we will swap out the data storage technology in the future. We are = currently experimenting with Kudu.

Thanks,
Ben
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org







= --Apple-Mail=_32259CB9-34FD-44F6-883F-7002B47EB7F9--