Is there a fundamental difference between the following queries? I can't get the second example
working with parquet files which contain 400,000+ nested records..
It seems like the system wants to flatten every possible record before applying the SQL Where
clause to the flattened data structure..
Example 1:
select b.* from dfs.`test1.json` b where b.item = 3
[
{
"item": 1,
"item_name": "name_for_1"
},
{
"item": 2,
"item_name": "name_for_2"
},
{
"item": 3,
"item_name": "name_for_3"
},
{
"item": 4,
"item_name": "name_for_4"
}
]
Example 2:
select b.* from
(select flatten(a.details) as details
from dfs.`test2.json` a) b
where b.details.item = 3
{
"header": "my_header_info",
"details": [
{
"item": 1,
"item_name": "name_for_1"
},
{
"item": 2,
"item_name": "name_for_2"
},
{
"item": 3,
"item_name": "name_for_3"
},
{
"item": 4,
"item_name": "name_for_4"
}
]
}
This message may contain information that is confidential or privileged. If you are not the
intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers
for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy
for more information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
© 2017 BlackRock, Inc. All rights reserved.
|