spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matthes <>
Subject LATERAL VIEW explode requests the full schema
Date Tue, 03 Mar 2015 12:36:03 GMT
I use "LATERAL VIEW explode(...)" to read data from a parquet-file but the
full schema is requeseted by parquet instead just the used columns. When I
didn't use LATERAL VIEW the requested schema has just the two columns which
I use. Is it correct or is there place for an optimization or do I
understand there somthing wrong?

Here are my examples:

1) hiveContext.sql("SELECT userid FROM pef WHERE observeddays==20140509") 

The requested schema is:

optional group observedDays (LIST) {
    repeated int32 array;
  required int64 userid;

This is what I expect although the result does not work, but that is not the
problem here!

2) hiveContext.sql("SELECT userid FROM pef LATERAL VIEW
explode(observeddays) od AS observed WHERE observed==20140509")     

the requested schema is:

  required int64 userid;
  optional int32 source;
  optional group observedDays (LIST) {
    repeated int32 array;
  optional group placetobe (LIST) {
    repeated group bag {
      optional group array {
        optional binary palces (UTF8);
        optional group dates (LIST) {
          repeated int32 array;

Why does parquet request the full schema. I just use two fields of the

Can somebody please explain me why this can happen.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message