spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matthes <matthias.diekst...@web.de>
Subject LATERAL VIEW explode requests the full schema
Date Tue, 03 Mar 2015 12:36:03 GMT
I use "LATERAL VIEW explode(...)" to read data from a parquet-file but the
full schema is requeseted by parquet instead just the used columns. When I
didn't use LATERAL VIEW the requested schema has just the two columns which
I use. Is it correct or is there place for an optimization or do I
understand there somthing wrong?

Here are my examples:

1) hiveContext.sql("SELECT userid FROM pef WHERE observeddays==20140509") 

The requested schema is:

optional group observedDays (LIST) {
    repeated int32 array;
  }
  required int64 userid;
}

This is what I expect although the result does not work, but that is not the
problem here!

2) hiveContext.sql("SELECT userid FROM pef LATERAL VIEW
explode(observeddays) od AS observed WHERE observed==20140509")     

the requested schema is:

  required int64 userid;
  optional int32 source;
  optional group observedDays (LIST) {
    repeated int32 array;
  }
  optional group placetobe (LIST) {
    repeated group bag {
      optional group array {
        optional binary palces (UTF8);
        optional group dates (LIST) {
          repeated int32 array;
        }
      }
    }
  }
}

Why does parquet request the full schema. I just use two fields of the
table.

Can somebody please explain me why this can happen.

Thanks!




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LATERAL-VIEW-explode-requests-the-full-schema-tp21893.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message