drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, David" <David....@blackrock.com>
Subject RE: Which perform better JSON or convert JSON to parquet format ?
Date Tue, 12 Jun 2018 14:45:22 GMT
Python supports tabular structures using pyarrow.

https://arrow.apache.org/docs/python/generated/pyarrow.schema.html

For nested structures like JSON you have to use C++ (parquet-cpp)

https://github.com/apache/parquet-cpp

We need more APIs developed to create nested JSON..

-----Original Message-----
From: Divya Gehlot [mailto:divya.htconex@gmail.com] 
Sent: Tuesday, June 12, 2018 5:25 AM
To: user@drill.apache.org
Subject: Re: Which perform better JSON or convert JSON to parquet format ?

[EXTERNAL EMAIL]


Hi David,
How to create the schema first using parquet library ?
Can you please give an example?

Thanks,
Divya

On Tue, 12 Jun 2018 at 00:03, Lee, David <David.Lee@blackrock.com> wrote:

> Parquet is faster especially if you are only looking for a subset of 
> json objects. Every JSON key / array is treated as a column.
>
> With that said creating parquet from JSON is not bullet proof if you 
> have really complex json which may have NULL values or many optional 
> keys (Drill can't figure out what data type a NULL JSON value is and 
> has trouble merging optional keys after sampling the first 20,000? 
> records)
>
> If you are creating parquet you should be using the parquet libraries 
> to define a consistent schema first. I've pretty much given up trying 
> to create parquet from json which always ends in index out of bound 
> (server
> crashing) errors when trying to query parquet.
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Monday, June 11, 2018 4:47 AM
> To: user <user@drill.apache.org>
> Subject: Re: Which perform better JSON or convert JSON to parquet format ?
>
> [EXTERNAL EMAIL]
>
>
> Yes. Drill is good at JSON.
>
> But Parquet will be faster during a scan.
>
> Faster may be better. Or other things may be more important.
>
> You have to decide what is important to you. The great virtue of drill 
> is that you have the choice.
>
>
>
> On Mon, Jun 11, 2018 at 11:06 AM Divya Gehlot 
> <divya.htconex@gmail.com>
> wrote:
>
> > Thanks to all for  your opinions !
> > As Drill has been popularised  as complex JSON reader as compare to 
> > other tools in space .
> > Was wondering does drill works better for JSON rather than parquet.
> >
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender 
> immediately and delete this message. See 
> http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers 
> for further information.  Please refer to 
> http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for 
> more information about BlackRock’s Privacy Policy.
>
> For a list of BlackRock's office addresses worldwide, see 
> http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
>
> © 2018 BlackRock, Inc. All rights reserved.
>
Mime
View raw message