From user-return-10152-apmail-drill-user-archive=drill.apache.org@drill.apache.org Wed Feb 13 21:03:44 2019 Return-Path: X-Original-To: apmail-drill-user-archive@www.apache.org Delivered-To: apmail-drill-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C87AC178F2 for ; Wed, 13 Feb 2019 21:03:44 +0000 (UTC) Received: (qmail 75945 invoked by uid 500); 13 Feb 2019 07:37:00 -0000 Delivered-To: apmail-drill-user-archive@drill.apache.org Received: (qmail 75859 invoked by uid 500); 13 Feb 2019 07:36:59 -0000 Mailing-List: contact user-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@drill.apache.org Delivered-To: mailing list user@drill.apache.org Received: (qmail 75848 invoked by uid 99); 13 Feb 2019 07:36:59 -0000 Received: from mail-relay.apache.org (HELO mailrelay2-lw-us.apache.org) (207.244.88.137) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Feb 2019 07:36:59 +0000 Received: from mail-it1-f171.google.com (mail-it1-f171.google.com [209.85.166.171]) by mailrelay2-lw-us.apache.org (ASF Mail Server at mailrelay2-lw-us.apache.org) with ESMTPSA id BFA502789 for ; Wed, 13 Feb 2019 07:36:58 +0000 (UTC) Received: by mail-it1-f171.google.com with SMTP id h6so2183302itl.1 for ; Tue, 12 Feb 2019 23:36:58 -0800 (PST) X-Gm-Message-State: AHQUAuahYImSuuUSzrLxd4Ch4zZroyBohFo4R3xm4ywrdk1aGVTcnJi6 DVSQPvTFhxDDl0XEiPiMO6u9NCUS6sic8UpyNw== X-Google-Smtp-Source: AHgI3IYI4z9vdILWEKvQyHiFmkZTQDKTk8TMOP74XMQp9V+B0bYxY9LkU7dvWlphPsCIrASqpnr5aqVJ6sZSz4SzznQ= X-Received: by 2002:a02:b4b8:: with SMTP id k53mr4109598jaj.56.1550043418195; Tue, 12 Feb 2019 23:36:58 -0800 (PST) MIME-Version: 1.0 References: <557590963.2609132.1550011022190.ref@mail.yahoo.com> <557590963.2609132.1550011022190@mail.yahoo.com> <1550012698658.1742126499@boxbe> <919079170.2594310.1550018310650@mail.yahoo.com> <448365197.114975.1550041017527@mail.yahoo.com> In-Reply-To: <448365197.114975.1550041017527@mail.yahoo.com> From: Abhishek Girish Date: Tue, 12 Feb 2019 23:36:47 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: HDFS storage prefix returning Error: VALIDATION ERROR: null To: user@drill.apache.org Content-Type: multipart/alternative; boundary="0000000000005400630581c19b10" --0000000000005400630581c19b10 Content-Type: text/plain; charset="UTF-8" I meant for you to run show files in hdfs.tmp But it looks like the plugin might not be initialized correctly (check if the hostname provided in the connection string can be resolved) Or you may not have used the right user when launching sqlline (user may not have permissions on the hdfs root dir or somewhere in the file path). On Tue, Feb 12, 2019 at 10:57 PM Krishnanand Khambadkone wrote: > The command show files in dfs.tmp does return the right output. > However when I try to run a simple hdfs query > select > s.application_id from hdfs.`/user/hive/spark_data/dt=2019-01-25/part-00004-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json` > it returns, > > Error: VALIDATION ERROR: Schema [[hdfs]] is not valid with respect to > either root schema or current default schema. > > > On Tuesday, February 12, 2019, 5:10:57 PM PST, Abhishek Girish < > agirish@apache.org> wrote: > > Can you please share the full error message (please see [1]) > > Also, can you please see if this works: show files in dfs.tmp; This is to > check if the DFS plugin is successfully initialized and Drill can see the > files on HDFS. And if that works, check if simpler queries on the data > works: select * from hdfs.`` > > [1] https://drill.apache.org/docs/troubleshooting/#enable-verbose-errors > > On Tue, Feb 12, 2019 at 4:38 PM Krishnanand Khambadkone > wrote: > > > Here is the hdfs storage definition and query I am using. Same query > > runs fine if run off local filesystem with dfs storage prefix. All I am > > doing is swapping dfs for hdfs. > > > > { > > > > "type": "file", > > > > "connection": "hdfs://host18-namenode:8020/", > > > > "config": null, > > > > "workspaces": { > > > > "tmp": { > > > > "location": "/tmp", > > > > "writable": true, > > > > "defaultInputFormat": null, > > > > "allowAccessOutsideWorkspace": false > > > > }, > > > > "root": { > > > > "location": "/", > > > > "writable": false, > > > > "defaultInputFormat": null, > > > > "allowAccessOutsideWorkspace": false > > > > } > > > > }, > > > > "formats": null, > > > > "enabled": true > > > > } > > > > > > > > > > select s.application_id, > > get_spark_attrs(s.spark_event,'spark.executor.memory') as > spark_attributes > > from > > > hdfs.`/user/hive/spark_data/dt=2019-01-25/part-00004-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json` > > s where (REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), > > '[^0-9A-Za-z]"', ''),'(".*)','') = 'SparkListenerEnvironmentUpdate' or > > REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"', > > ''),'(".*)','') = 'SparkListenerApplicationStart' or > > REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"', > > ''),'(".*)','') = 'SparkListenerApplicationEnd') group by application_id, > > spark_attributes order by application_id; > > > > > > > > On Tuesday, February 12, 2019, 3:04:40 PM PST, Abhishek Girish < > > agirish@apache.org> wrote: > > > > This message is eligible for Automatic Cleanup! (agirish@apache.org) > Add > > cleanup rule | More info > > Hey Krishnanand, > > > > As mentioned by other folks in earlier threads, can you make sure to > > include ALL RELEVANT details in your emails? That includes the query, > > storage plugin configuration, data format, sample data / description of > the > > data, the full log for the query failure? It's necessary if one needs to > be > > able to understand the issue or offer help. > > > > Regards, > > Abhishek > > > > On Tue, Feb 12, 2019 at 2:37 PM Krishnanand Khambadkone > > wrote: > > > > > I have defined a hdfs storage type with all the required properties. > > > However, when I try to use that in the query it returns > > > Error: VALIDATION ERROR: null > > > > > > --0000000000005400630581c19b10--