lens-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Puneet Gupta <puneet.gu...@inmobi.com>
Subject Re: Adding MySQL as a Data Source
Date Tue, 30 Aug 2016 08:33:17 GMT
Adding example for dimension table without partitions as mentioned by Rajat


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<x_dimension_table xmlns="uri:lens:cube:0.1" dimension_name="account"
table_name="account_mysql" weight="0.1">
    <columns>
        <column name="id" _type="int" comment=""/>
        <column name="account_name" _type="string" comment=""/>
        <column name="status" _type="string" comment=""/>
        <column name="enabled" _type="boolean" comment=""/>
    </columns>
    <storage_tables>
        <storage_table>
            <storage_name>mysql_storage</storage_name>
            <table_desc external="true"
input_format="org.apache.hadoop.mapred.TextInputFormat"
output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
serde_class_name="org.apache.lens.storage.db.DBSerde"
storage_handler_name="org.apache.lens.storage.db.DBStorageHandler"  >
                <table_parameters>
                    <property name="lens.metastore.native.table.name"
value="dim_account"/>
                    <property name="lens.metastore.native.db.name"
value="lensdb"/>
                </table_parameters>
            </table_desc>
        </storage_table>
    </storage_tables>
</x_dimension_table>


The actual table in mysql is lensdb.dim_account


Thanks,
Puneet Gupta

On Tue, Aug 30, 2016 at 12:33 PM, Rajat Khandelwal <rajatgupta59@gmail.com>
wrote:

> The idea of partitions in dimtables is based on snapshots. A metadata is
> either snapshotted (at some interval the entire metadata is added in a new
> partition), or it's fixed (like you mentioned). In that case, I believe
> removing the partition column declaration from the dimtable xml file should
> work. We have the same use case and are using it in production without any
> issue. Hive metadata is snpathotted hourly/daily and db metadata is fixed.
>
>
> On Tue, Aug 30, 2016 at 2:02 AM Tao Yan <tyan@linkedin.com> wrote:
>
>> Hi Rajat,
>>
>> I created the storage and dimtable using the configurations you provided,
>> and the query failed because no partition is added to the table, so, I
>> added the following partition:
>>
>> *<x_partition fact_or_dimension_table_name="dimension1_subset"
>> update_period="HOURLY"*
>> *  xmlns="uri:lens:cube:0.1"
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
>> <http://www.w3.org/2001/XMLSchema-instance>"*
>> *  xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">*
>> *  <time_partition_spec>*
>> *    <part_spec_element key="dt" value="2016-08-25T00:00:00"/>*
>> *  </time_partition_spec>*
>> *</x_partition>*
>>
>> And when I run the query again, Lens translate the Lens query to the
>> following SQL:
>>
>>
>>
>> *lens-shell>select primary_key, attr1, attr5 from dimension129 Aug 2016
>> 18:10:06 [Spring Shell] INFO  cliLogger - Query handle:
>> ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b029 Aug 2016 18:10:06 [Spring Shell]
>> INFO  cliLogger - User query: 'select primary_key, attr1, attr5 from
>> dimension1' was submitted to jdbc/mysql29 Aug 2016 18:10:06 [Spring Shell]
>> INFO  cliLogger -  Driver query: 'SELECT ( dimension1 . primary_key ), (
>> dimension1 . attr1 ), ( dimension1 . attr5 ) FROM mysql_dimension1_subset
>> dimension1 WHERE ((((dimension1.dt = 'latest'))))' and Driver handle: null**And
>> it failed because of 'unknown column' error:*
>>
>> *29 Aug 2016 17:59:24 [Spring Shell] INFO  cliLogger - Query Status:
>> Status : FAILED Message : Query execution failed! Progress : 1.0 Error :
>> Unknown column 'dimension1_subset__lens_dimension1_subset_dimension1.dt' in
>> 'where clause'*
>>
>> *This is expected because the column dt is not part of mysql table **dimension1_subset.
>> I added the column to the table and also set the value to 'latest', then,
>> the query returned the result.*
>>
>>
>>
>> *​-----------------*
>>
>> *primary_key    attr1    attr5Results of query stored at :
>> hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b0.csv
>> <http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/ccb3b99a-22d6-4fbb-9bb8-9f06f633c4b0.csv>
>>  4 rows processed in (0) seconds.*
>>
>>
>> *I think this is not the way Lens should work with MySQL because
>> sometimes MySQL table cannot be changed, and it should not always be
>> changed just because Lens is going to query it. In Hive, when a newer
>> partition is added, Lens will automatically created a 'latest' partition, I
>> wonder how that work in MySQL or other databases. *
>>
>> Is it possible to add a MySQL table to a partitioned dimtable without
>> specifying the partition column, and assume the added MySQL table is the
>> latest partition?
>>
>> Thanks,
>>
>> On Mon, Aug 29, 2016 at 12:46 AM, Rajat Khandelwal <
>> rajatgupta59@gmail.com> wrote:
>>
>>> Use something like
>>>
>>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>>> <x_storage xmlns="uri:lens:cube:0.1" name="mysql" classname="org.apache.lens.storage.db.DBStorage">
>>> </x_storage>
>>>
>>>
>>> <storage_table>
>>>   <update_periods>
>>>     <update_period>HOURLY</update_period>
>>>   </update_periods>
>>>   <storage_name>db</storage_name>
>>>   <table_desc external="true"  input_format="org.apache.hadoop.mapred.TextInputFormat"
output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" serde_class_name="org.apache.lens.storage.db.DBSerde"
storage_handler_name="org.apache.lens.storage.db.DBStorageHandler" num_buckets="0" compressed="false">
>>>     <part_cols>
>>>       <column name="pt" _type="string" comment="date partition"/>
>>>       <column name="dt" _type="string" comment="date partition"/>
>>>       <column name="ot" _type="string" comment="date partition"/>
>>>     </part_cols>
>>>     <table_parameters>
>>>       <property name="lens.metastore.native.table.name" value="table_name_in_mysql_db"/>
>>>       <property name="lens.metastore.native.db.name" value="mysql_db_name"/>
>>>       <property name="lens.metastore.native.table.column.mapping"
>>>                 value="pt=process_time,field1=field1_actual_name,field2=field2_actual_name"/>
>>>       <property name="cube.storagetable.start.times" value="now - 4 days"/>
>>>     </table_parameters>
>>>     <time_part_cols>pt</time_part_cols>
>>>     <time_part_cols>dt</time_part_cols>
>>>     <time_part_cols>ot</time_part_cols>
>>>   </table_desc>
>>> </storage_table>
>>>
>>>
>>> On Sat, Aug 27, 2016 at 3:42 AM Tao Yan <tyan@linkedin.com> wrote:
>>>
>>>> Hi Lens Developers,
>>>>
>>>> I am trying to add mysql as a data source, so, I created the driver as
>>>> follows:
>>>>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>
>>>> <configuration>
>>>>   <property>
>>>>     <name>lens.driver.jdbc.driver.class</name>
>>>>     <value>com.mysql.jdbc.Driver</value>
>>>>   </property>
>>>>   <property>
>>>>     <name>lens.driver.jdbc.db.uri</name>
>>>>     <value>jdbc:mysql://****mysql**hostname****/lens;user=****;
>>>> passowrd=****</value>
>>>>   </property>
>>>>   <property>
>>>>     <name>lens.driver.jdbc.db.user</name>
>>>>     <value>****</value>
>>>>   </property>
>>>>   <property>
>>>>     <name>lens.cube.query.driver.supported.storages</name>
>>>>     <value>mysql</value>
>>>>     <final>true</final>
>>>>   </property>
>>>>   <property>
>>>>     <name>lens.driver.jdbc.query.rewriter</name>
>>>>     <value>org.apache.lens.driver.jdbc.ColumnarSQLRewriter</value>
>>>>   </property>
>>>>   <property>
>>>>     <name>lens.driver.jdbc.explain.keyword</name>
>>>>     <value>explain plan for </value>
>>>>   </property>
>>>> </configuration>
>>>>
>>>> And created a test table with data in mysql:
>>>>
>>>> create table dimension1_subset (
>>>> primary_key BIGINT(8) not null primary key,
>>>> attr1 BIGINT(8),
>>>> attr5 BIGINT(8)
>>>> );
>>>>
>>>> [image: Screen Shot 2016-08-26 at 2.44.24 PM.png]
>>>> ​
>>>> And created the storage file:
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>
>>>> <x_storage classname="org.apache.lens.storage.db.DBStorage"
>>>> name="mysql" xmlns="uri:lens:cube:0.1"
>>>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>>>   <properties>
>>>>     *<property name="lens.storage.db.url"
>>>> value="jdbc:mysql://lva1-db34.corp.linkedin.com/lens
>>>> <http://lva1-db34.corp.linkedin.com/lens>"/>*
>>>>   </properties>
>>>> </x_storage>
>>>>
>>>> I am confused when set the value for *lens.storage.db.url *because* it
>>>> is neither local filesystem nor HDFS.*
>>>>
>>>> And I don't know what should the storage table look like:
>>>>
>>>> <x_dimension_table dimension_name="dimension1"
>>>> table_name="dimension1_table3" weight="5.0" xmlns="uri:lens:cube:0.1"
>>>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>>>   <columns>
>>>>     <column name="primary_key" _type="BIGINT"/>
>>>>     <column name="attr1" _type="BIGINT"/>
>>>>     <column name="attr5" _type="BIGINT"/>
>>>>   </columns>
>>>>   <properties>
>>>>     <property name="dimension1.prop" value="t3"/>
>>>>   </properties>
>>>>   <storage_tables>
>>>>
>>>>
>>>> *    <storage_table>      ...    </storage_table>*
>>>>   </storage_tables>
>>>> </x_dimension_table>
>>>>
>>>> What should I use for the table_location, and what about part_cols? How
>>>> do I add data to this table?
>>>>
>>>> Thanks,
>>>> --
>>>>
>>>> *Tao Yan*
>>>> Software Engineer
>>>> Data Analytics Infrastructure Tools and Services
>>>>
>>>>
>>>>
>>>> 206.250.5345
>>>> tyan@linkedin.com
>>>> https://www.linkedin.com/in/taousc
>>>>
>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Mime
View raw message