spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matthes <mdiekst...@sensenetworks.com>
Subject Re: Is it possible to use Parquet with Dremel encoding
Date Fri, 26 Sep 2014 15:48:08 GMT
Hi Frank,

thanks al lot for your response, this is a very helpful!

Actually I'm try to figure out does the current spark version supports
Repetition levels
(https://blog.twitter.com/2013/dremel-made-simple-with-parquet) but now it
looks good to me.
It is very hard to find some good things about that. Now I found this as
well: 
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala;h=1dc58633a2a68cd910c1bab01c3d5ee1eb4f8709;hb=f479cf37

I wasn't sure of that because nested data can be many different things!
If it works with SQL, to find the firstRepeatedid or secoundRepeatedid would
be awesome. But if it only works with kind of map/reduce job than it also
good. The most important thing is to filter the first or secound  repeated
value as fast as possible and in combination as well.
I start now to play with this things to get the best search results!

Me schema looks like this:

val nestedSchema =
    """message nestedRowSchema 
{
		  int32 firstRepeatedid;
		  repeated group level1
		  {
		  	int64 secoundRepeatedid;
		  	repeated group level2 
		      {
		      	int64	value1;
		      	int32	value2;
		      }
		  }
	}
    """

Best,
Matthes



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-use-Parquet-with-Dremel-encoding-tp15186p15239.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message