drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, David" <David....@blackrock.com>
Subject RE: Json to Parquet
Date Fri, 08 Mar 2019 17:43:23 GMT
Nope which is why I use Python with pyarrow to convert JSON to Parquet these days. Hopefully
arrow / parquet-cpp supports parquet dictionaries within a couple months.

https://issues.apache.org/jira/browse/ARROW-1644

All these types of JSON structures are problematic for any Json Schema Learning engine like
Drill.

File ABC.json is fine, but..
[
{"address": "1600 Pennsylvania Avenue", "zip_code": "20500" },
]

File XYZ.json will bomb
[
{"address": "10 Downing Street ", "zip_code": null},
]

No way to figure out what datatype zip_code is in the second file. I think Drill by default
will save this as a BOOLEAN type and now you have zip code column with string and boolean
values which creates chaos and will result in an exception..

The only clean way to solve these problems is to stop using schema learning and inject a schema
https://json-schema.org/ into the query somehow.

I just gave up trying to use Drill to work with JSON and now use Python to read json and generate
parquet datasets which I can then use in Drill, etc..


-----Original Message-----
From: Dweep Sharma <dweep.sharma@redbus.com> 
Sent: Friday, March 8, 2019 1:56 AM
To: user@drill.apache.org
Subject: reg: Json to Parquet

External Email: Use caution with links and attachments


Hi,

I have a CTAS query that converts JSON to Parquet format and encounter this error sometimes

 (org.apache.parquet.schema.InvalidSchemaException) Cannot write a schema with an empty group:
optional group address

I guess this happens when drill encounters a field like "address" : {} (empty object)

Is there a way to handle this ?

Thanks,
-Dweep

--
*::DISCLAIMER::

----------------------------------------------------------------------------------------------------------------------------------------------------


The contents of this e-mail and any attachments are confidential and intended for the named
recipient(s) only.E-mail transmission is not guaranteed to be secure or error-free as information
could be intercepted, corrupted,lost, destroyed, arrive late or incomplete, or may contain
viruses in transmission. The e mail and its contents(with or without referred errors) shall
therefore not attach any liability on the originator or redBus.com. Views or opinions, if
any, presented in this email are solely those of the author and may not necessarily reflect
the views or opinions of redBus.com. Any form of reproduction, dissemination, copying, disclosure,
modification,distribution and / or publication of this message without the prior written consent
of authorized representative of redbus.
<https://urldefense.proofpoint.com/v2/url?u=http-3A__redbus.in_&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=Uvy2K8V8SJd_wUf26oFaOeXqIDADwHQ76HkPbQGdutw&s=Dzn4ub-codA6gMk65crCiYDZRb5MF91NA5XXlC473EI&e=>com
is strictly prohibited. If you have received this email in error please delete it and notify
the sender immediately.Before opening any email and/or attachments, please check them for
viruses and other defects.*


This message may contain information that is confidential or privileged. If you are not the
intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers
for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy
for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.
Mime
View raw message