lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Yan <t...@linkedin.com>
Subject Re: Adding MySQL as a Data Source
Date Mon, 29 Aug 2016 20:32:16 GMT
Hi Rajat,

I created the storage and dimtable using the configurations you provided,
and the query failed because no partition is added to the table, so, I
added the following partition:

*<x_partition fact_or_dimension_table_name="dimension1_subset"
update_period="HOURLY"*
*  xmlns="uri:lens:cube:0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
<http://www.w3.org/2001/XMLSchema-instance>"*
*  xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">*
*  <time_partition_spec>*
*    <part_spec_element key="dt" value="2016-08-25T00:00:00"/>*
*  </time_partition_spec>*
*</x_partition>*

And when I run the query again, Lens translate the Lens query to the
following SQL:



*lens-shell>select primary_key, attr1, attr5 from dimension129 Aug 2016
18:10:06 [Spring Shell] INFO  cliLogger - Query handle:
ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b029 Aug 2016 18:10:06 [Spring Shell]
INFO  cliLogger - User query: 'select primary_key, attr1, attr5 from
dimension1' was submitted to jdbc/mysql29 Aug 2016 18:10:06 [Spring Shell]
INFO  cliLogger -  Driver query: 'SELECT ( dimension1 . primary_key ), (
dimension1 . attr1 ), ( dimension1 . attr5 ) FROM mysql_dimension1_subset
dimension1 WHERE ((((dimension1.dt = 'latest'))))' and Driver handle: null**And
it failed because of 'unknown column' error:*

*29 Aug 2016 17:59:24 [Spring Shell] INFO  cliLogger - Query Status: Status
: FAILED Message : Query execution failed! Progress : 1.0 Error : Unknown
column 'dimension1_subset__lens_dimension1_subset_dimension1.dt' in 'where
clause'*

*This is expected because the column dt is not part of mysql table
**dimension1_subset.
I added the column to the table and also set the value to 'latest', then,
the query returned the result.*



*​-----------------*

*primary_key    attr1    attr5Results of query stored at :
hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b0.csv
<http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b0.csv>
 4 rows processed in (0) seconds.*


*I think this is not the way Lens should work with MySQL because sometimes
MySQL table cannot be changed, and it should not always be changed just
because Lens is going to query it. In Hive, when a newer partition is
added, Lens will automatically created a 'latest' partition, I wonder how
that work in MySQL or other databases. *

Is it possible to add a MySQL table to a partitioned dimtable without
specifying the partition column, and assume the added MySQL table is the
latest partition?

Thanks,

On Mon, Aug 29, 2016 at 12:46 AM, Rajat Khandelwal <rajatgupta59@gmail.com>
wrote:

> Use something like
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <x_storage xmlns="uri:lens:cube:0.1" name="mysql" classname="org.apache.lens.storage.db.DBStorage">
> </x_storage>
>
>
> <storage_table>
>   <update_periods>
>     <update_period>HOURLY</update_period>
>   </update_periods>
>   <storage_name>db</storage_name>
>   <table_desc external="true"  input_format="org.apache.hadoop.mapred.TextInputFormat"
output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" serde_class_name="org.apache.lens.storage.db.DBSerde"
storage_handler_name="org.apache.lens.storage.db.DBStorageHandler" num_buckets="0" compressed="false">
>     <part_cols>
>       <column name="pt" _type="string" comment="date partition"/>
>       <column name="dt" _type="string" comment="date partition"/>
>       <column name="ot" _type="string" comment="date partition"/>
>     </part_cols>
>     <table_parameters>
>       <property name="lens.metastore.native.table.name" value="table_name_in_mysql_db"/>
>       <property name="lens.metastore.native.db.name" value="mysql_db_name"/>
>       <property name="lens.metastore.native.table.column.mapping"
>                 value="pt=process_time,field1=field1_actual_name,field2=field2_actual_name"/>
>       <property name="cube.storagetable.start.times" value="now - 4 days"/>
>     </table_parameters>
>     <time_part_cols>pt</time_part_cols>
>     <time_part_cols>dt</time_part_cols>
>     <time_part_cols>ot</time_part_cols>
>   </table_desc>
> </storage_table>
>
>
> On Sat, Aug 27, 2016 at 3:42 AM Tao Yan <tyan@linkedin.com> wrote:
>
>> Hi Lens Developers,
>>
>> I am trying to add mysql as a data source, so, I created the driver as
>> follows:
>>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>>   <property>
>>     <name>lens.driver.jdbc.driver.class</name>
>>     <value>com.mysql.jdbc.Driver</value>
>>   </property>
>>   <property>
>>     <name>lens.driver.jdbc.db.uri</name>
>>     <value>jdbc:mysql://****mysql**hostname****/lens;user=****;
>> passowrd=****</value>
>>   </property>
>>   <property>
>>     <name>lens.driver.jdbc.db.user</name>
>>     <value>****</value>
>>   </property>
>>   <property>
>>     <name>lens.cube.query.driver.supported.storages</name>
>>     <value>mysql</value>
>>     <final>true</final>
>>   </property>
>>   <property>
>>     <name>lens.driver.jdbc.query.rewriter</name>
>>     <value>org.apache.lens.driver.jdbc.ColumnarSQLRewriter</value>
>>   </property>
>>   <property>
>>     <name>lens.driver.jdbc.explain.keyword</name>
>>     <value>explain plan for </value>
>>   </property>
>> </configuration>
>>
>> And created a test table with data in mysql:
>>
>> create table dimension1_subset (
>> primary_key BIGINT(8) not null primary key,
>> attr1 BIGINT(8),
>> attr5 BIGINT(8)
>> );
>>
>> [image: Screen Shot 2016-08-26 at 2.44.24 PM.png]
>> ​
>> And created the storage file:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <x_storage classname="org.apache.lens.storage.db.DBStorage" name="mysql"
>> xmlns="uri:lens:cube:0.1"
>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>   <properties>
>>     *<property name="lens.storage.db.url"
>> value="jdbc:mysql://lva1-db34.corp.linkedin.com/lens
>> <http://lva1-db34.corp.linkedin.com/lens>"/>*
>>   </properties>
>> </x_storage>
>>
>> I am confused when set the value for *lens.storage.db.url *because* it
>> is neither local filesystem nor HDFS.*
>>
>> And I don't know what should the storage table look like:
>>
>> <x_dimension_table dimension_name="dimension1"
>> table_name="dimension1_table3" weight="5.0" xmlns="uri:lens:cube:0.1"
>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>   <columns>
>>     <column name="primary_key" _type="BIGINT"/>
>>     <column name="attr1" _type="BIGINT"/>
>>     <column name="attr5" _type="BIGINT"/>
>>   </columns>
>>   <properties>
>>     <property name="dimension1.prop" value="t3"/>
>>   </properties>
>>   <storage_tables>
>>
>>
>> *    <storage_table>      ...    </storage_table>*
>>   </storage_tables>
>> </x_dimension_table>
>>
>> What should I use for the table_location, and what about part_cols? How
>> do I add data to this table?
>>
>> Thanks,
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Mime
View raw message