Hello,
you can also use Python wrapper pyarrow to create nested/json-like structures in Python. For
example using `pyarrow.array([[1, 2], [1], None, [12, 23, 23]])`.
Cheers
Uwe
On Tue, Jun 12, 2018, at 4:45 PM, Lee, David wrote:
> Python supports tabular structures using pyarrow.
>
> https://arrow.apache.org/docs/python/generated/pyarrow.schema.html
>
> For nested structures like JSON you have to use C++ (parquet-cpp)
>
> https://github.com/apache/parquet-cpp
>
> We need more APIs developed to create nested JSON..
>
> -----Original Message-----
> From: Divya Gehlot [mailto:divya.htconex@gmail.com]
> Sent: Tuesday, June 12, 2018 5:25 AM
> To: user@drill.apache.org
> Subject: Re: Which perform better JSON or convert JSON to parquet format ?
>
> [EXTERNAL EMAIL]
>
>
> Hi David,
> How to create the schema first using parquet library ?
> Can you please give an example?
>
> Thanks,
> Divya
>
> On Tue, 12 Jun 2018 at 00:03, Lee, David <David.Lee@blackrock.com> wrote:
>
> > Parquet is faster especially if you are only looking for a subset of
> > json objects. Every JSON key / array is treated as a column.
> >
> > With that said creating parquet from JSON is not bullet proof if you
> > have really complex json which may have NULL values or many optional
> > keys (Drill can't figure out what data type a NULL JSON value is and
> > has trouble merging optional keys after sampling the first 20,000?
> > records)
> >
> > If you are creating parquet you should be using the parquet libraries
> > to define a consistent schema first. I've pretty much given up trying
> > to create parquet from json which always ends in index out of bound
> > (server
> > crashing) errors when trying to query parquet.
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Monday, June 11, 2018 4:47 AM
> > To: user <user@drill.apache.org>
> > Subject: Re: Which perform better JSON or convert JSON to parquet format ?
> >
> > [EXTERNAL EMAIL]
> >
> >
> > Yes. Drill is good at JSON.
> >
> > But Parquet will be faster during a scan.
> >
> > Faster may be better. Or other things may be more important.
> >
> > You have to decide what is important to you. The great virtue of drill
> > is that you have the choice.
> >
> >
> >
> > On Mon, Jun 11, 2018 at 11:06 AM Divya Gehlot
> > <divya.htconex@gmail.com>
> > wrote:
> >
> > > Thanks to all for your opinions !
> > > As Drill has been popularised as complex JSON reader as compare to
> > > other tools in space .
> > > Was wondering does drill works better for JSON rather than parquet.
> > >
> >
> >
> > This message may contain information that is confidential or privileged.
> > If you are not the intended recipient, please advise the sender
> > immediately and delete this message. See
> > http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers
> > for further information. Please refer to
> > http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for
> > more information about BlackRock’s Privacy Policy.
> >
> > For a list of BlackRock's office addresses worldwide, see
> > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
> >
> > © 2018 BlackRock, Inc. All rights reserved.
> >
|