spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Vonthron <mvonth...@mnubo.com>
Subject Which predicate pushdown work or does not work with Parquet?
Date Tue, 07 Nov 2017 00:29:29 GMT
Hi all,

I am trying to determine which predicate pushdown work or does not work
with Spark+Parquet (mostly for versions 2.1.0 and/or 2.2.0).

I've read a lot of messages from the pull requests comments, JIRA tickets,
even the comments in Parquet's source but it's hard to have a clear picture
of when a pushdown is honoured depending on
  - the data type (Int? String? Timestamp?)
  - operator involved (isNull, >=, ...)
  - and even the column name (is there a "." in it or not?)

The only types I consistently got working in my tests and reads are
"regular numbers" but support for Strings and Timestamps is crucial for my
use case.

Do you have any "reference" on this subject?


Additionally, here is a test I've been running with it's results:
  https://gist.github.com/mvonthron/81cbd4a9060d3085711e5e142280dda6

There might be errors or misconfigured things but the TL;DR is: I only got
INTs and BOOLs to reliably work with no weirdness :|

Thanks!
Manuel


-- 
Manuel Vonthron big data software developer office +1.514.313.1400
cell    +1.514.677.8699

-- 


CONFIDENTIALITY: This e-mail message (including attachments, if any) is 
confidential and is intended only for the addressee. Any unauthorized use 
or disclosure is strictly prohibited. Disclosure of this e-mail to anyone 
other than the intended addressee does not constitute waiver of privilege. 
If you have received this communication in error, please notify us 
immediately and delete this. Thank you for your cooperation.  This message 
has not been encrypted.  Special arrangements can be made for encryption 
upon request.

CONFIDENTIALITÉ:  Ce message courriel (y compris les pièces jointes, le cas 
échéant) est confidentiel et destiné uniquement à la personne ou  à 
l'entité à qui il est adressé. Toute utilisation ou divulgation non permise 
est strictement interdite.  L'obligation de confidentialité et de secret 
professionnel demeure malgré toute divulgation.  Si vous avez reçu le 
présent courriel et ses annexes par erreur, veuillez nous en informer 
immédiatement et le détruire.  Nous vous remercions de votre 
collaboration.  Le présent message n'a pas été crypté.  Le cryptage est 
possible sur demande spéciale.

Mime
View raw message