parquet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anna Szonyi <szo...@cloudera.com.INVALID>
Subject Re: Column index testing break down
Date Thu, 07 Mar 2019 15:57:58 GMT
Hi Wes,

Zoltan has created a C++ implementation for Impala. We would be happy to
contribute it to Parquet cpp when we have time or if someone is keen on
getting it in sooner and wants to take it over, we would be happy to review
it.
Feel free to check it out and chime in to the review for the Impala
implementation: https://gerrit.cloudera.org/#/c/12065/.

Best,
Anna

On Wed, Mar 6, 2019 at 4:17 PM Wes McKinney <wesmckinn@gmail.com> wrote:

> Is there anyone who might be able to take on the project of
> implementing this in C++? We're having an increasing number of C++
> Parquet users nowadays.
>
> On Tue, Mar 5, 2019 at 9:54 AM Anna Szonyi <szonyi@cloudera.com.invalid>
> wrote:
> >
> > Hi dev@ community,
> >
> > This week I would like to ask for some feedback on the testing we've been
> > sending out.
> > We've been sharing the most important test cases we've created for the
> > write path of the parquet column index feature, now we would like to hear
> > from you!
> >
> > Is there anything else you feel is missing or would like to get clarity
> on?
> >
> > Thanks,
> > Anna
> >
> > On Mon, Feb 25, 2019 at 6:26 PM Anna Szonyi <szonyi@cloudera.com> wrote:
> >
> > > Hi dev@,
> > >
> > > After a week off, this week we have an excerpt from our internal data
> > > interoperability testing, which tests compatibility between Hive,
> Spark and
> > > Impala over Avro and Parquet. This test case is tailor-made to test
> > > specific layouts so that files written using parquet-mr can be read by
> any
> > > of the above mentioned components. We have also checked fault injection
> > > cases.
> > >
> > > The test suite is private currently, however we have made the test
> classes
> > > corresponding to the following document public:
> > >
> https://docs.google.com/document/d/1mHYQGXE4oM1zgg83MMc4ho1gmoJMeZcq9MWG99WgL3A
> > >
> > > Please find the test cases and their results here:
> > > https://github.com/zivanfi/column-indexes-data-interop-tests-excerpts
> > >
> > > Best,
> > > Anna
> > >
> > >
> > >
> > > On Mon, Feb 11, 2019 at 4:57 PM Anna Szonyi <szonyi@cloudera.com>
> wrote:
> > >
> > >> Hi dev@,
> > >>
> > >> Last week we had a twofer: e2e tool and integration test validating
> the
> > >> contract of column indexes/indices (if all values are between min and
> max
> > >> and if set whether the boundary order is correct). There are some
> takeaways
> > >> and corrections to be made to the former (like the max->min typo) -
> thanks
> > >> for the feedback on that!
> > >>
> > >> The next installment is also an integration test that tests the
> filtering
> > >> logic on files including simple and special cases (user defined
> function,
> > >> complex filtering, no filtering, etc.).
> > >>
> > >>
> > >>
> https://github.com/apache/parquet-mr/blob/e7db9e20f52c925a207ea62d6dda6dc4e870294e/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestColumnIndexFiltering.java
> > >>
> > >> Please let me know if you have any questions/comments.
> > >>
> > >> Best,
> > >> Anna
> > >>
> > >>
> > >>
> > >>
> > >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message