drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Re: Apache Drill Queries Take so Long after which Webconsole dies and I don't see results
Date Wed, 01 Aug 2018 22:13:50 GMT
Hi Peter, 
To add to the discussion, I suspect the reason that this query is taking so long is the number
of times you are asking Drill to do a “full file scan”.  You might want to try using COUNT()
as a windowing function instead of using the subquery.  
—C

> On Aug 1, 2018, at 06:24, Peter Edike <peter.edike@interswitchgroup.com> wrote:
> 
> Hello everyone,
> 
> I am currently using apache drill to query a directory of parquet files (total size 64gb
with each file averaging 6.4Gb per size).  I am running the following query
> 
> select * from dfs.rj.vw_reports_201805 where unique_key in
>    (
>         select unique_key from dfs.rj.vw_reports_201805 group by unique_key having count(unique_key)
> 1
>    )
> limit 40
> 
> Now this query takes about 42 minutes after which it fails because one of the drillbits
in the drillbit cluster crashes. I check the logs and I see several errors summarized as follows
> 
> 20:35:18.006 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - Connection
timed out for connection string (172.35.15.129:5181,172.35.15.151:5181,172.35.15.152:5181)
and timeout (5000) / elapsed (13347)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
>        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
[curator-client-2.7.1.jar:na]
>        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) [curator-client-2.7.1.jar:na]
>        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
[curator-client-2.7.1.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
[curator-framework-2.7.1.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
[curator-framework-2.7.1.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
[curator-framework-2.7.1.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
[curator-framework-2.7.1.jar:na]
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_171]
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_171]
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_171]
>        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> AND
> 
> 
> Aug 1, 2018 7:54:18 AM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics
because created_by could not be parsed (see PARQUET-251): parquet-mr (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb)
> 
> org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr
(build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb) using format: (.+) version ((.*) )?\(build
?(.*)\)
> 
>        at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
> 
>        at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66)
> 
>        at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264)
> 
>        at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568)
> 
>        at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545)
> 
>        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455)
> 
>        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:404)
> 
>        at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:390)
> 
>        at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:117)
> 
>        at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:53)
> 
>        at org.apache.drill.exec.physical.impl.ImplCreator$2.run(ImplCreator.java:146)
> 
>        at org.apache.drill.exec.physical.impl.ImplCreator$2.run(ImplCreator.java:142)
> 
>        at java.security.AccessController.doPrivileged(Native Method)
> 
>        at javax.security.auth.Subject.doAs(Subject.java:422)
> 
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1633)
> 
>        at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:142)
> 
>        at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182)
> 
>        at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:110)
> 
>        at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87)
> 
>        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:206)
> 
>        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> 
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 
>        at java.lang.Thread.run(Thread.java:748)
> 
> The Schema for the data I am querying in question is as follows
> 
> index_no
> 
> ANY
> 
> YES
> 
> bank_code
> 
> ANY
> 
> YES
> 
> trxn_category
> 
> ANY
> 
> YES
> 
> Debit_Account_Type
> 
> ANY
> 
> YES
> 
> Credit_Account_Type
> 
> ANY
> 
> YES
> 
> trxn_amount
> 
> ANY
> 
> YES
> 
> trxn_fee
> 
> ANY
> 
> YES
> 
> trxn_date
> 
> ANY
> 
> YES
> 
> Currency
> 
> ANY
> 
> YES
> 
> Late_reversal
> 
> ANY
> 
> YES
> 
> card_type
> 
> ANY
> 
> YES
> 
> terminal_type
> 
> ANY
> 
> YES
> 
> Acquirer
> 
> ANY
> 
> YES
> 
> Issuer
> 
> ANY
> 
> YES
> 
> Value_SettleAmount
> 
> ANY
> 
> YES
> 
> Value_RequestedAmount
> 
> ANY
> 
> YES
> 
> Volume
> 
> ANY
> 
> YES
> 
> unique_key
> 
> ANY
> 
> YES
> 
> source_node_name
> 
> ANY
> 
> YES
> 
> business_date
> 
> ANY
> 
> YES
> 
> PT_datetime_tran_local
> 
> ANY
> 
> YES
> 
> PTC_pan
> 
> ANY
> 
> YES
> 
> PTC_terminal_id
> 
> ANY
> 
> YES
> 
> PTC_card_acceptor_id_code
> 
> ANY
> 
> YES
> 
> PTC_card_acceptor_name_loc
> 
> ANY
> 
> YES
> 
> PT_system_trace_audit_nr
> 
> ANY
> 
> YES
> 
> PT_message_type
> 
> ANY
> 
> YES
> 
> PT_from_account_id
> 
> ANY
> 
> YES
> 
> PT_to_account_id
> 
> ANY
> 
> YES
> 
> PT_tran_type
> 
> ANY
> 
> YES
> 
> PT_rsp_code_req
> 
> ANY
> 
> YES
> 
> PT_rsp_code_rsp
> 
> ANY
> 
> YES
> 
> PT_settle_amount_req
> 
> ANY
> 
> YES
> 
> PT_settle_amount_rsp
> 
> ANY
> 
> YES
> 
> PTC_merchant_type
> 
> ANY
> 
> YES
> 
> PT_settle_amount_impact
> 
> ANY
> 
> YES
> 
> PT_settle_tran_fee_req
> 
> ANY
> 
> YES
> 
> PT_settle_tran_fee_rsp
> 
> ANY
> 
> YES
> 
> PT_auth_id_rsp
> 
> ANY
> 
> YES
> 
> PT_retrieval_reference_nr
> 
> ANY
> 
> YES
> 
> PTC_totals_group
> 
> ANY
> 
> YES
> 
> PTC_card_product
> 
> ANY
> 
> YES
> 
> PT_tran_currency_code
> 
> ANY
> 
> YES
> 
> PT_payee
> 
> ANY
> 
> YES
> 
> PT_tran_amount_req
> 
> ANY
> 
> YES
> 
> PT_tran_amount_rsp
> 
> ANY
> 
> YES
> 
> PT_extended_tran_type
> 
> ANY
> 
> YES
> 
> PT_tran_postilion_originated
> 
> ANY
> 
> YES
> 
> PT_tran_completed
> 
> ANY
> 
> YES
> 
> PT_tran_nr
> 
> ANY
> 
> YES
> 
> PT_retention_data
> 
> ANY
> 
> YES
> 
> PT_acquiring_inst_id_code
> 
> ANY
> 
> YES
> 
> PT_message_reason_code
> 
> ANY
> 
> YES
> 
> PT_sponsor_bank
> 
> ANY
> 
> YES
> 
> PT_datetime_tran_gmt
> 
> ANY
> 
> YES
> 
> PT_datetime_req
> 
> ANY
> 
> YES
> 
> PT_datetime_rsp
> 
> ANY
> 
> YES
> 
> PT_realtime_business_date
> 
> ANY
> 
> YES
> 
> PT_recon_business_date
> 
> ANY
> 
> YES
> 
> PT_from_account_type
> 
> ANY
> 
> YES
> 
> PT_to_account_type
> 
> ANY
> 
> YES
> 
> PT_tran_cash_req
> 
> ANY
> 
> YES
> 
> PT_tran_cash_rsp
> 
> ANY
> 
> YES
> 
> PT_tran_tran_fee_req
> 
> ANY
> 
> YES
> 
> PT_tran_tran_fee_rsp
> 
> ANY
> 
> YES
> 
> PT_tran_tran_fee_currency_code
> 
> ANY
> 
> YES
> 
> PT_tran_proc_fee_req
> 
> ANY
> 
> YES
> 
> PT_tran_proc_fee_rsp
> 
> ANY
> 
> YES
> 
> PT_tran_proc_fee_currency_code
> 
> ANY
> 
> YES
> 
> PT_settle_currency_code
> 
> ANY
> 
> YES
> 
> PT_pos_entry_mode
> 
> ANY
> 
> YES
> 
> PT_pos_condition_code
> 
> ANY
> 
> YES
> 
> PT_tran_reversed
> 
> ANY
> 
> YES
> 
> PT_card_verification_result
> 
> ANY
> 
> YES
> 
> PT_online_system_id
> 
> ANY
> 
> YES
> 
> PT_participant_id
> 
> ANY
> 
> YES
> 
> PT_receiving_inst_id_code
> 
> ANY
> 
> YES
> 
> PT_routing_type
> 
> ANY
> 
> YES
> 
> PT_pt_pos_card_input_mode
> 
> ANY
> 
> YES
> 
> PT_source_node_key
> 
> ANY
> 
> YES
> 
> PT_proc_online_system_id
> 
> ANY
> 
> YES
> 
> PTC_post_tran_cust_id
> 
> ANY
> 
> YES
> 
> PTC_source_node_name
> 
> ANY
> 
> YES
> 
> PTC_card_seq_nr
> 
> ANY
> 
> YES
> 
> PTC_expiry_date
> 
> ANY
> 
> YES
> 
> PTC_service_restriction_code
> 
> ANY
> 
> YES
> 
> PTC_terminal_owner
> 
> ANY
> 
> YES
> 
> PTC_mapped_card_acceptor_id_code
> 
> ANY
> 
> YES
> 
> PTC_pos_terminal_type
> 
> ANY
> 
> YES
> 
> PTC_pan_encrypted
> 
> ANY
> 
> YES
> 
> PTSP_Account_Nr
> 
> ANY
> 
> YES
> 
> PTSP_Code
> 
> ANY
> 
> YES
> 
> account_PTSP_Code
> 
> ANY
> 
> YES
> 
> PTSP_Name
> 
> ANY
> 
> YES
> 
> Sort_Code
> 
> ANY
> 
> YES
> 
> rdm_amt
> 
> ANY
> 
> YES
> 
> Reward_Code
> 
> ANY
> 
> YES
> 
> Reward_discount
> 
> ANY
> 
> YES
> 
> ptsp_terminal_id
> 
> ANY
> 
> YES
> 
> reward_terminal_id
> 
> ANY
> 
> YES
> 
> 
> txn_id
> 
> ANY
> 
> YES
> 
> web_category_code
> 
> ANY
> 
> YES
> 
> web_category_name
> 
> ANY
> 
> YES
> 
> web_fee_type
> 
> ANY
> 
> YES
> 
> web_merchant_disc
> 
> ANY
> 
> YES
> 
> web_fee_cap
> 
> ANY
> 
> YES
> 
> Account_Name
> 
> ANY
> 
> YES
> 
> account_nr
> 
> ANY
> 
> YES
> 
> Acquiring_bank
> 
> ANY
> 
> YES
> 
> journal_amount
> 
> ANY
> 
> YES
> 
> xls_amount
> 
> ANY
> 
> YES
> 
> merch_cat_amount_cap
> 
> ANY
> 
> YES
> 
> merch_cat_visa_amount_cap
> 
> ANY
> 
> YES
> 
> reward_amount_cap
> 
> ANY
> 
> YES
> 
> Amount_config_state
> 
> ANY
> 
> YES
> 
> Amount_description
> 
> ANY
> 
> YES
> 
> Authorized_Person
> 
> ANY
> 
> YES
> 
> ACC_BANK_CODE
> 
> ANY
> 
> YES
> 
> merch_cat_bearer
> 
> ANY
> 
> YES
> 
> merch_cat_visa_bearer
> 
> ANY
> 
> YES
> 
> merch_cat_visa_category_code
> 
> ANY
> 
> YES
> 
> merch_cat_category_code
> 
> ANY
> 
> YES
> 
> merch_cat_visa_category_name
> 
> ANY
> 
> YES
> 
> merch_cat_category_name
> 
> ANY
> 
> YES
> 
> credit_acc_id
> 
> ANY
> 
> YES
> 
> credit_acc_nr_id
> 
> ANY
> 
> YES
> 
> credit_cardholder_acc_id
> 
> ANY
> 
> YES
> 
> credit_cardholder_acc_type
> 
> ANY
> 
> YES
> 
> CreditAccNr_acc_id
> 
> ANY
> 
> YES
> 
> CreditAccNr_acc_nr
> 
> ANY
> 
> YES
> 
> CreditAccNr_acc_nr_id
> 
> ANY
> 
> YES
> 
> CreditAccNr_se_id
> 
> ANY
> 
> YES
> 
> CreditAccNr_state
> 
> ANY
> 
> YES
> 
> Date_Modified
> 
> ANY
> 
> YES
> 
> debit_acc_id
> 
> ANY
> 
> YES
> 
> debit_acc_nr_id
> 
> ANY
> 
> YES
> 
> debit_cardholder_acc_id
> 
> ANY
> 
> YES
> 
> debit_cardholder_acc_type
> 
> ANY
> 
> YES
> 
> DebitAccNr_acc_id
> 
> ANY
> 
> YES
> 
> DebitAccNr_acc_nr
> 
> ANY
> 
> YES
> 
> DebitAccNr_acc_nr_id
> 
> ANY
> 
> YES
> 
> extended_trans_type
> 
> ANY
> 
> YES
> 
> fee
> 
> ANY
> 
> YES
> 
> Fee_amount_id
> 
> ANY
> 
> YES
> 
> merch_cat_fee_cap
> 
> ANY
> 
> YES
> 
> merch_cat_visa_fee_cap
> 
> ANY
> 
> YES
> 
> reward_fee_cap
> 
> ANY
> 
> YES
> 
> Fee_description
> 
> ANY
> 
> YES
> 
> Fee_Discount
> 
> ANY
> 
> YES
> 
> Fee_fee_id
> 
> ANY
> 
> YES
> 
> fee_id
> 
> ANY
> 
> YES
> 
> Fee_name
> 
> ANY
> 
> YES
> 
> Fee_se_id
> 
> ANY
> 
> YES
> 
> merch_cat_category_fee_type
> 
> ANY
> 
> YES
> 
> merch_cat_category_visa_fee_type
> 
> ANY
> 
> YES
> 
> merch_cat_category_merch_discount
> 
> ANY
> 
> YES
> 
> merch_cat_category_visa_merch_discount
> 
> ANY
> 
> YES
> 
> PT_sink_node_name
> 
> ANY
> 
> YES
> 
> post_tran_id
> 
> ANY
> 
> YES
> 
> Miscellaneous
> 
> ANY
> 
> YES
> 
> terminal_owner
> 
> ANY
> 
> YES
> 
> terminal_owner_name
> 
> ANY
> 
> YES
> 
> REF_rpt_account_type
> 
> ANY
> 
> YES
> 
> REF_IsPoolAccount
> 
> ANY
> 
> YES
> 
> From_Account_Type
> 
> ANY
> 
> YES
> 
> To_Account_Type
> 
> ANY
> 
> YES
> 
> Tran_type_description
> 
> ANY
> 
> YES
> 
> Region
> 
> ANY
> 
> YES
> 
> Transaction_Status
> 
> ANY
> 
> YES
> 
> Transaction_type_Impact
> 
> ANY
> 
> YES
> 
> Message_Type_Desc
> 
> ANY
> 
> YES
> 
> Response_Code_description
> 
> ANY
> 
> YES
> 
> REF_BatchId
> 
> ANY
> 
> YES
> 
> Beneficiary_Account
> 
> ANY
> 
> YES
> 
> Rate
> 
> ANY
> 
> YES
> 
> final_fee
> 
> ANY
> 
> YES
> 
> final_fee_type
> 
> ANY
> 
> YES
> 
> settlement_date
> 
> ANY
> 
> YES
> 
> 
> Please what can I do to improve performance of this query
> 


Mime
View raw message