hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna <research...@gmail.com>
Subject HFile vs Parquet for very wide table
Date Thu, 21 Jan 2016 22:43:27 GMT
We are evaluating Parquet and HBase for storing a dense & very, very wide
matrix (can have more than 600K columns).

I've following questions:

   - Is there is a limit on # of columns in Parquet or HFile? We expect to
   query [10-100] columns at a time using Spark - what are the performance
   implications in this scenario?
   - HBase can support millions of columns - anyone with prior experience
   that compares Parquet vs HFile performance for wide structured tables?
   - We want a schema-less solution since the matrix can get wider over a
   period of time
   - Is there a way to generate wide structured schema-less Parquet files
   using map-reduce (input files are in custom binary format)?

What other solutions other than Parquet & HBase are useful for this

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message