spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Kim <bbuil...@gmail.com>
Subject Re: Spark SQL Thriftserver with HBase
Date Sat, 08 Oct 2016 18:26:20 GMT
Mich,

Are you talking about the Phoenix JDBC Server? If so, I forgot about that alternative.

Thanks,
Ben


> On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> I don't think it will work
> 
> you can use phoenix on top of hbase
> 
> hbase(main):336:0> scan 'tsco', 'LIMIT' => 1
> ROW                                                       COLUMN+CELL
>  TSCO-1-Apr-08                                            column=stock_daily:Date, timestamp=1475866783376,
value=1-Apr-08
>  TSCO-1-Apr-08                                            column=stock_daily:close, timestamp=1475866783376,
value=405.25
>  TSCO-1-Apr-08                                            column=stock_daily:high, timestamp=1475866783376,
value=406.75
>  TSCO-1-Apr-08                                            column=stock_daily:low, timestamp=1475866783376,
value=379.25
>  TSCO-1-Apr-08                                            column=stock_daily:open, timestamp=1475866783376,
value=380.00
>  TSCO-1-Apr-08                                            column=stock_daily:stock, timestamp=1475866783376,
value=TESCO PLC
>  TSCO-1-Apr-08                                            column=stock_daily:ticker,
timestamp=1475866783376, value=TSCO
>  TSCO-1-Apr-08                                            column=stock_daily:volume,
timestamp=1475866783376, value=49664486
> 
> And the same on Phoenix on top of Hvbase table
> 
> 0: jdbc:phoenix:thin:url=http://rhes564:8765 <http://rhes564:8765/>> select
substr(to_char(to_date("Date",'dd-MMM-yy')),1,10) AS TradeDate, "close" AS "Day's close",
"high" AS "Day's High", "low" AS "Day's Low", "open" AS "Day's Open", "ticker", "volume",
(to_number("low")+to_number("high"))/2 AS "AverageDailyPrice" from "tsco" where to_number("volume")
> 0 and "high" != '-' and to_date("Date",'dd-MMM-yy') > to_date('2015-10-06','yyyy-MM-dd')
order by  to_date("Date",'dd-MMM-yy') limit 1;
> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
> |  TRADEDATE  | Day's close  | Day's High  | Day's Low  | Day's Open  | ticker  |  volume
  | AverageDailyPrice  |
> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
> | 2015-10-07  | 197.00       | 198.05      | 184.84     | 192.20      | TSCO    | 30046994
 | 191.445            |
> 
> HTH
> 
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
>  
> 
> On 8 October 2016 at 19:05, Felix Cheung <felixcheung_m@hotmail.com <mailto:felixcheung_m@hotmail.com>>
wrote:
> Great, then I think those packages as Spark data source should allow you to do exactly
that (replace org.apache.spark.sql.jdbc with HBASE one)
> 
> I do think it will be great to get more examples around this though. Would be great if
you could share your experience with this!
> 
> 
> _____________________________
> From: Benjamin Kim <bbuild11@gmail.com <mailto:bbuild11@gmail.com>>
> Sent: Saturday, October 8, 2016 11:00 AM
> Subject: Re: Spark SQL Thriftserver with HBase
> To: Felix Cheung <felixcheung_m@hotmail.com <mailto:felixcheung_m@hotmail.com>>
> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>
> 
> 
> Felix,
> 
> My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using just SQL.
I have been able to CREATE tables using this statement below in the past:
> 
> CREATE TABLE <table-name>
> USING org.apache.spark.sql.jdbc
> OPTIONS (
>   url "jdbc:postgresql://<hostname>:<port>/dm?user=<username>&password=<password>",
>   dbtable "dim.dimension_acamp"
> );
> 
> After doing this, I can access the PostgreSQL table using Spark SQL JDBC Thriftserver
using SQL statements (SELECT, UPDATE, INSERT, etc.). I want to do the same with HBase tables.
We tried this using Hive and HiveServer2, but the response times are just too long.
> 
> Thanks,
> Ben
> 
> 
> On Oct 8, 2016, at 10:53 AM, Felix Cheung <felixcheung_m@hotmail.com <mailto:felixcheung_m@hotmail.com>>
wrote:
> 
> Ben,
> 
> I'm not sure I'm following completely.
> 
> Is your goal to use Spark to create or access tables in HBASE? If so the link below and
several packages out there support that by having a HBASE data source for Spark. There are
some examples on how the Spark code look like in that link as well. On that note, you should
also be able to use the HBASE data source from pure SQL (Spark SQL) query as well, which should
work in the case with the Spark SQL JDBC Thrift Server (with USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10
<http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>).
> 
> 
> _____________________________
> From: Benjamin Kim <bbuild11@gmail.com <mailto:bbuild11@gmail.com>>
> Sent: Saturday, October 8, 2016 10:40 AM
> Subject: Re: Spark SQL Thriftserver with HBase
> To: Felix Cheung <felixcheung_m@hotmail.com <mailto:felixcheung_m@hotmail.com>>
> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>
> 
> 
> Felix,
> 
> The only alternative way is to create a stored procedure (udf) in database terms that
would run Spark scala code underneath. In this way, I can use Spark SQL JDBC Thriftserver
to execute it using SQL code passing the key, values I want to UPSERT. I wonder if this is
possible since I cannot CREATE a wrapper table on top of a HBase table in Spark SQL?
> 
> What do you think? Is this the right approach?
> 
> Thanks,
> Ben
> 
> On Oct 8, 2016, at 10:33 AM, Felix Cheung <felixcheung_m@hotmail.com <mailto:felixcheung_m@hotmail.com>>
wrote:
> 
> HBase has released support for Spark
> hbase.apache.org/book.html#spark <http://hbase.apache.org/book.html#spark>
> 
> And if you search you should find several alternative approaches.
> 
> 
> 
> 
> 
> On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <bbuild11@gmail.com <mailto:bbuild11@gmail.com>>
wrote:
> 
> Does anyone know if Spark can work with HBase tables using Spark SQL? I know in Hive
we are able to create tables on top of an underlying HBase table that can be accessed using
MapReduce jobs. Can the same be done using HiveContext or SQLContext? We are trying to setup
a way to GET and POST data to and from the HBase table using the Spark SQL JDBC thriftserver
from our RESTful API endpoints and/or HTTP web farms. If we can get this to work, then we
can load balance the thriftservers. In addition, this will benefit us in giving us a way to
abstract the data storage layer away from the presentation layer code. There is a chance that
we will swap out the data storage technology in the future. We are currently experimenting
with Kudu.
> 
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> 
> 
> 
> 
> 
> 


Mime
View raw message