spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matthes <mdiekst...@sensenetworks.com>
Subject Is it possible to use Parquet with Dremel encoding
Date Fri, 26 Sep 2014 00:05:43 GMT
Hi again!

At the moment I try to use parquet and I want to keep the data into the
memory in an efficient way to make requests against the data as fast as
possible.
I read about parquet it is able to encode nested columns. Parquet uses the
Dremel encoding with definition and repetition levels. 
Is it at the moment possible to use this in spark as well or is it actually
not implemented? If yes, I’m not sure how to do it. I saw some examples,
they try to put some arrays or case classes in other case classes, nut I
don’t think that is the right way.  The other thing that I saw in this
relation was SchemaRDDs. 

Input:

Col1	|	Col2	|	Col3	|	Col4
Int	|	long	|	long	|	int
---------------------------------------------
14	|	1234	|	1422	|	3
14	|	3212	|	1542	|	2
14	|	8910	|	1422	|	8
15	|	1234	|	1542	|	9
15	|	8897	|	1422	|	13

Want this Parquet-format:
Col3	|	Col1	|	Col4	|	Col2
long	|	int	|	int	|	long
--------------------------------------------
1422	|	14	|	3	|	1234
“	|	“	|	8	|	8910
“	|	15	|	13	|	8897
1542	|	14	|	2	|	3212
“	|	15	|	9	|	1234

It would be awesome if somebody could give me a good hint how can I do that
or maybe a better way.

Best,
Matthes




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-use-Parquet-with-Dremel-encoding-tp15186.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message