drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, David" <David....@blackrock.com>
Subject RE: How to write dynamic queries in Apache Drill to query Nested JSON ...
Date Thu, 28 Feb 2019 18:49:40 GMT
They would end up as two different columns.

You would have to merge them into a single column using a SQL statement

Select COALESCE(taste , type) as taste_or_type from ...

With that said I avoid using JSON these days with Apache Drill (or any SQL engine). Drill
appears to sample like 30,0000? Json records to figure out what the schema is. If you have
"new columns" after the first 30,000 json records they either don't show up will bomb your
query.

Today I just load up JSON files into a python program using json.loads() and then convert
it into a table format (pandas or list of dictionaries) before writing it out to a tabular
parquet file using pyarrow.

https://arrow.apache.org/docs/python/parquet.html

The only nested "json" format that is bullet proof is nested parquet which includes a full
schema definition in the header of the file so you don't have to rely on hit or miss data
sampling..

-----Original Message-----
From: Salil Sanghvi <salil.sanghvi@renianalytics.com> 
Sent: Thursday, February 28, 2019 12:35 AM
To: user@drill.apache.org
Subject: How to write dynamic queries in Apache Drill to query Nested JSON ...

External Email: Use caution with links and attachments


Hi,

We have a requirement to query HBASE table which has single column with nested json:
1. What will be the query to convert the JSON data  into table format.
2. Also the query should be generic enough to pick any new columns that may vary between different
json blocks.
3. In below example the 1st JSON block has attribute "taste" and second JSON block has attribute
"type", but the query should be generic enough to return both type and taste 4. Query should
be able to extract any new attribute introduced in future.

{
        "id": "0001",
        "taste": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
                {
                        "batter":
                                [
                                        { "id": "1001", "type": "Regular" },},
                                        { "id": "1004", "type": "Devil's Food" }
                                ]
                },
        "topping":
                [
                        { "id": "5001", "type": "None" },
                        { "id": "5002", "type": "Glazed" }
                ]
}

{
        "id": "0002",
        "type": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
                {
                        "batter":
                                [
                                        { "id": "1001", "type": "Regular" },
                                        { "id": "1004", "type": "Devil's Food" }
                                ]
                },
        "topping":
                [
                        { "id": "5001", "type": "None" },
                        { "id": "5004", "type": "Maple" }
                ]
}


This message may contain information that is confidential or privileged. If you are not the
intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers
for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy
for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.
Mime
View raw message