drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chunhui Shi <c...@mapr.com>
Subject Re: Storage Plugin for accessing Hive ORC Table from Drill
Date Sat, 21 Jan 2017 07:34:15 GMT
I guess you are using Hive 2.0 as meta server while Drill has only 1.2 libraries.


In Hive 2.0 above, This delta format could have more than one '_' as separator while 1.2 has
only one '_'.


I think Drill should eventually update to use Hive's 2.0/2.1 libraries.

________________________________
From: Anup Tiwari <anup.tiwari@games24x7.com>
Sent: Friday, January 20, 2017 10:07:50 PM
To: user@drill.apache.org; dev@drill.apache.org
Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill

@Andries, We are using Hive 2.1.1 with Drill 1.9.0.

@Zelaine, Could this be a problem in your Hive metastore?--> As i mentioned
earlier, i am able to read hive parquet tables in Drill through hive
storage plugin. So can you tell me a bit more like which type of
configuration i am missing in metastore?

Regards,
*Anup Tiwari*

On Sat, Jan 21, 2017 at 4:56 AM, Zelaine Fong <zfong@mapr.com> wrote:

> The stack trace shows the following:
>
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException:
> java.io.IOException: Failed to get numRows from HiveTable
>
> The Drill optimizer is trying to read rowcount information from Hive.
> Could this be a problem in your Hive metastore?
>
> Has anyone else seen this before?
>
> -- Zelaine
>
> On 1/20/17, 7:35 AM, "Andries Engelbrecht" <aengelbrecht@mapr.com> wrote:
>
>     What version of Hive are you using?
>
>
>     --Andries
>
>     ________________________________
>     From: Anup Tiwari <anup.tiwari@games24x7.com>
>     Sent: Friday, January 20, 2017 3:00:43 AM
>     To: user@drill.apache.org; dev@drill.apache.org
>     Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill
>
>     Hi,
>
>     Please find below Create Table Statement and subsequent Drill Error :-
>
>     *Table Structure :*
>
>     CREATE TABLE `logindetails_all`(
>       `sid` char(40),
>       `channel_id` tinyint,
>       `c_t` bigint,
>       `l_t` bigint)
>     PARTITIONED BY (
>       `login_date` char(10))
>     CLUSTERED BY (
>       channel_id)
>     INTO 9 BUCKETS
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>     STORED AS INPUTFORMAT
>       'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>     OUTPUTFORMAT
>       'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>     LOCATION
>       'hdfs://hostname1:9000/usr/hive/warehouse/logindetails_all'
>     TBLPROPERTIES (
>       'compactorthreshold.hive.compactor.delta.num.threshold'='6',
>       'compactorthreshold.hive.compactor.delta.pct.threshold'='0.5',
>       'transactional'='true',
>       'transient_lastDdlTime'='1484313383');
>     ;
>
>     *Drill Error :*
>
>     *Query* : select * from hive.logindetails_all limit 1;
>
>     *Error :*
>     2017-01-20 16:21:12,625 [277e145e-c6bc-3372-01d0-6c5b75b92d73:foreman]
>     INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
>     277e145e-c6bc-3372-01d0-6c5b75b92d73: select * from
> hive.logindetails_all
>     limit 1
>     2017-01-20 16:21:12,831 [277e145e-c6bc-3372-01d0-6c5b75b92d73:foreman]
>     ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR:
>     NumberFormatException: For input string: "0000004_0000"
>
>
>     [Error Id: 53fa92e1-477e-45d2-b6f7-6eab9ef1da35 on
>     prod-hadoop-101.bom-prod.aws.games24x7.com:31010]
>     org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
>     NumberFormatException: For input string: "0000004_0000"
>
>
>     [Error Id: 53fa92e1-477e-45d2-b6f7-6eab9ef1da35 on
>     prod-hadoop-101.bom-prod.aws.games24x7.com:31010]
>         at
>     org.apache.drill.common.exceptions.UserException$
> Builder.build(UserException.java:543)
>     ~[drill-common-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.work.foreman.Foreman$ForemanResult.
> close(Foreman.java:825)
>     [drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.work.foreman.Foreman.moveToState(
> Foreman.java:935)
>     [drill-java-exec-1.9.0.jar:1.9.0]
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> java:281)
>     [drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>     [na:1.8.0_72]
>         at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>     [na:1.8.0_72]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
>     Caused by: org.apache.drill.exec.work.foreman.ForemanException:
> Unexpected
>     exception during fragment initialization: Internal error: Error while
>     applying rule DrillPushProjIntoScan, args
>     [rel#4220197:LogicalProject.NONE.ANY([]).[](input=rel#
> 4220196:Subset#0.ENUMERABLE.ANY([]).[],sid=$0,channel_id=$
> 1,c_t=$2,l_t=$3,login_date=$4),
>     rel#4220181:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hive,
>     logindetails_all])]
>         ... 4 common frames omitted
>     Caused by: java.lang.AssertionError: Internal error: Error while
> applying
>     rule DrillPushProjIntoScan, args
>     [rel#4220197:LogicalProject.NONE.ANY([]).[](input=rel#
> 4220196:Subset#0.ENUMERABLE.ANY([]).[],sid=$0,channel_id=$
> 1,c_t=$2,l_t=$3,login_date=$4),
>     rel#4220181:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hive,
>     logindetails_all])]
>         at org.apache.calcite.util.Util.newInternal(Util.java:792)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoRuleCall.
> onMatch(VolcanoRuleCall.java:251)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.
> findBestExp(VolcanoPlanner.java:808)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.tools.Programs$RuleSetProgram.run(
> Programs.java:303)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.drill.exec.planner.sql.handlers.
> DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.sql.handlers.
> DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.
> convertToDrel(DefaultSqlHandler.java:240)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.
> convertToDrel(DefaultSqlHandler.java:290)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(
> DefaultSqlHandler.java:168)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(
> DrillSqlWorker.java:123)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(
> DrillSqlWorker.java:97)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at org.apache.drill.exec.work.foreman.Foreman.runSQL(
> Foreman.java:1008)
>     [drill-java-exec-1.9.0.jar:1.9.0]
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> java:264)
>     [drill-java-exec-1.9.0.jar:1.9.0]
>         ... 3 common frames omitted
>     Caused by: java.lang.AssertionError: Internal error: Error occurred
> while
>     applying rule DrillPushProjIntoScan
>         at org.apache.calcite.util.Util.newInternal(Util.java:792)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoRuleCall.
> transformTo(VolcanoRuleCall.java:150)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.RelOptRuleCall.transformTo(
> RelOptRuleCall.java:213)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.
> onMatch(DrillPushProjIntoScan.java:90)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.calcite.plan.volcano.VolcanoRuleCall.
> onMatch(VolcanoRuleCall.java:228)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         ... 14 common frames omitted
>     Caused by: java.lang.reflect.UndeclaredThrowableException: null
>         at com.sun.proxy.$Proxy75.getNonCumulativeCost(Unknown Source)
> ~[na:na]
>         at
>     org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(
> RelMetadataQuery.java:115)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.
> getCost(VolcanoPlanner.java:1112)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(
> RelSubset.java:363)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(
> RelSubset.java:344)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.
> addRelToSet(VolcanoPlanner.java:1827)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.
> registerImpl(VolcanoPlanner.java:1760)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.
> register(VolcanoPlanner.java:1017)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(
> VolcanoPlanner.java:1037)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(
> VolcanoPlanner.java:1940)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         at
>     org.apache.calcite.plan.volcano.VolcanoRuleCall.
> transformTo(VolcanoRuleCall.java:138)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         ... 17 common frames omitted
>     Caused by: java.lang.reflect.InvocationTargetException: null
>         at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
> ~[na:na]
>         at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>     ~[na:1.8.0_72]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_72]
>         at
>     org.apache.calcite.rel.metadata.CachingRelMetadataProvider$
> CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:132)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         ... 28 common frames omitted
>     Caused by: java.lang.reflect.UndeclaredThrowableException: null
>         at com.sun.proxy.$Proxy75.getNonCumulativeCost(Unknown Source)
> ~[na:na]
>         ... 32 common frames omitted
>     Caused by: java.lang.reflect.InvocationTargetException: null
>         at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
> ~[na:na]
>         at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>     ~[na:1.8.0_72]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_72]
>         at
>     org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$
> ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         ... 33 common frames omitted
>     Caused by: java.lang.reflect.UndeclaredThrowableException: null
>         at com.sun.proxy.$Proxy75.getNonCumulativeCost(Unknown Source)
> ~[na:na]
>         ... 37 common frames omitted
>     Caused by: java.lang.reflect.InvocationTargetException: null
>         at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
> ~[na:na]
>         at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>     ~[na:1.8.0_72]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_72]
>         at
>     org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$
> 1$1.invoke(ReflectiveRelMetadataProvider.java:182)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         ... 38 common frames omitted
>     Caused by: org.apache.drill.common.exceptions.DrillRuntimeException:
>     java.io.IOException: Failed to get numRows from HiveTable
>         at
>     org.apache.drill.exec.store.hive.HiveScan.getScanStats(
> HiveScan.java:233)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.physical.base.AbstractGroupScan.getScanStats(
> AbstractGroupScan.java:79)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.planner.logical.DrillScanRel.
> computeSelfCost(DrillScanRel.java:159)
>     ~[drill-java-exec-1.9.0.jar:1.9.0]
>         at
>     org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
> getNonCumulativeCost(RelMdPercentageOriginalRows.java:165)
>     ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>         ... 42 common frames omitted
>     Caused by: java.io.IOException: Failed to get numRows from HiveTable
>         at
>     org.apache.drill.exec.store.hive.HiveMetadataProvider.
> getStats(HiveMetadataProvider.java:113)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.store.hive.HiveScan.getScanStats(
> HiveScan.java:224)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         ... 45 common frames omitted
>     Caused by: java.lang.RuntimeException: serious problem
>         at
>     org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(
> OrcInputFormat.java:1021)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         at
>     org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(
> OrcInputFormat.java:1048)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.store.hive.HiveMetadataProvider$1.
> run(HiveMetadataProvider.java:253)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.store.hive.HiveMetadataProvider$1.
> run(HiveMetadataProvider.java:241)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         at java.security.AccessController.doPrivileged(Native Method)
>     ~[na:1.8.0_72]
>         at javax.security.auth.Subject.doAs(Subject.java:422)
> ~[na:1.8.0_72]
>         at
>     org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
>     ~[hadoop-common-2.7.1.jar:na]
>         at
>     org.apache.drill.exec.store.hive.HiveMetadataProvider.
> splitInputWithUGI(HiveMetadataProvider.java:241)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.store.hive.HiveMetadataProvider.
> getPartitionInputSplits(HiveMetadataProvider.java:142)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         at
>     org.apache.drill.exec.store.hive.HiveMetadataProvider.
> getStats(HiveMetadataProvider.java:105)
>     ~[drill-storage-hive-core-1.9.0.jar:1.9.0]
>         ... 46 common frames omitted
>     Caused by: java.util.concurrent.ExecutionException:
>     java.lang.NumberFormatException: For input string: "0000004_0000"
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     ~[na:1.8.0_72]
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     ~[na:1.8.0_72]
>         at
>     org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(
> OrcInputFormat.java:998)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         ... 55 common frames omitted
>     Caused by: java.lang.NumberFormatException: For input string:
> "0000004_0000"
>         at
>     java.lang.NumberFormatException.forInputString(
> NumberFormatException.java:65)
>     ~[na:1.8.0_72]
>         at java.lang.Long.parseLong(Long.java:589) ~[na:1.8.0_72]
>         at java.lang.Long.parseLong(Long.java:631) ~[na:1.8.0_72]
>         at
>     org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:310)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         at
>     org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(
> AcidUtils.java:379)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         at
>     org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(
> OrcInputFormat.java:634)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         at
>     org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(
> OrcInputFormat.java:620)
>     ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     ~[na:1.8.0_72]
>         ... 3 common frames omitted
>
>
>
>
>     Regards,
>     *Anup Tiwari*
>
>     On Thu, Jan 19, 2017 at 9:18 PM, Andries Engelbrecht <
> aengelbrecht@mapr.com>
>     wrote:
>
>     > I have not seen issues reading Hive ORC data with Drill.
>     >
>     >
>     > What is the DDL for the table in Hive?
>     >
>     >
>     > --Andries
>     >
>     > ________________________________
>     > From: Anup Tiwari <anup.tiwari@games24x7.com>
>     > Sent: Thursday, January 19, 2017 12:49:20 AM
>     > To: user@drill.apache.org
>     > Cc: dev@drill.apache.org
>     > Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill
>     >
>     > We have created a ORC format table in hive and we were trying to
> read it in
>     > drill through hive plugin, but it is giving us error. But with same
> hive
>     > plugin, we are able to read parquet table created in hive.
>     >
>     > So after searching a bit, i found a drill documentation link
>     > <https://drill.apache.org/docs/apache-drill-contribution-ideas/>
> which
>     > says
>     > that we have to create custom storage plugin to read ORC format
> tables. So
>     > can you tell me how to create custom storage plugin in this case?
>     >
>     >
>     >
>     > Regards,
>     > *Anup Tiwari*
>     >
>     > On Thu, Jan 19, 2017 at 1:55 PM, Nitin Pawar <
> nitinpawar432@gmail.com>
>     > wrote:
>     >
>     > > you want to use the ORC files created by hive directly in drill or
> you
>     > want
>     > > to use them through hive?
>     > >
>     > > On Thu, Jan 19, 2017 at 1:40 PM, Anup Tiwari <
> anup.tiwari@games24x7.com>
>     > > wrote:
>     > >
>     > > > +Dev
>     > > >
>     > > > Can someone help me in this?
>     > > >
>     > > > Regards,
>     > > > *Anup Tiwari*
>     > > >
>     > > > On Sun, Jan 15, 2017 at 2:21 PM, Anup Tiwari <
>     > anup.tiwari@games24x7.com>
>     > > > wrote:
>     > > >
>     > > > > Hi Team,
>     > > > >
>     > > > > Can someone tell me how to configure custom storage plugin in
> Drill
>     > for
>     > > > > accessing hive ORC tables?
>     > > > >
>     > > > > Thanks in advance!!
>     > > > >
>     > > > > Regards,
>     > > > > *Anup Tiwari*
>     > > > >
>     > > >
>     > >
>     > >
>     > >
>     > > --
>     > > Nitin Pawar
>     > >
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message