drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asaf Mesika <asaf.mes...@gmail.com>
Subject Re: Cloudera impala
Date Wed, 24 Oct 2012 19:43:05 GMT
From what I managed to grasp while reading their Java front-end, their HBase support is very
limited:
RowKey supports is limited to Strings only, and I think only one value (field)
Occupies more space, and does not allow composite Primary Key
WHERE  supports only for Strings type fields
No support for integer type fields, thus you are forced to occupy more space when saving your
data
No support for binary encoding of couple of fields, although they did mention they are using
Avro - couldn't find it yet.
This means each column is saved as a real HBase column -> occupies space and might hinder
performance on very large scale data set
No support for secondary indexes
No support for user defined functions in SELECT, or as whole (Stored procedure like)

All and all, the code seems very solid:
BE written in C++
Could be good for performance, but I'm not sure it's worth the overhead if you're running
on HBase which likely to cause any bottlenecks or any query to be slow. 
BE - FE streaming communication, which supports canceling a query in the middle of execution,
and I guess allows for streaming results to the client
FE written in Java. 
Communications between Java FE and BE using Thrift


Asaf Mesika


On 24 באוק 2012, at 19:04, "Surendra , Manchikanti" <surendra.manchikanti@gmail.com>
wrote:

> Impala supports HDFS and HBase.
> 
> Thanks,
> -- Surendra Manchikanti
> 
> 
> On Wed, Oct 24, 2012 at 10:23 PM, Arun Ramakrishnan <
> sinchronized.arun@gmail.com> wrote:
> 
>> 1. Its Apache lincese<
>> http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/
>>> 
>> 2. Trevni columnar format is something to look into.
>> 
>> On Wed, Oct 24, 2012 at 9:18 AM, karthik tunga <karthik.tunga@gmail.com
>>> wrote:
>> 
>>> 
>>> 
>> http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/
>>> 
>>> It looks like impala has HDFS support as well.
>>> 
>>> Cheers,
>>> Karthik
>>> 
>>> On 24 October 2012 12:16, kulkarni.swarnim@gmail.com <
>>> kulkarni.swarnim@gmail.com> wrote:
>>> 
>>>>> It's licensed to Cloudera only.
>>>> 
>>>> I don't think that is entirely true.
>>>> 
>>>> "*Cloudera Impala is an Apache-licensed open source project that was
>>>> founded and is led by Cloudera.*"[1]
>>>> 
>>>> [1]
>>>> 
>>>> 
>>> 
>> http://www.cloudera.com/content/cloudera/en/products/cloudera-enterprise-core/cloudera-enterprise-RTQ.html
>>>> 
>>>> On Wed, Oct 24, 2012 at 11:11 AM, Timothy Chen <tnachen@gmail.com>
>>> wrote:
>>>> 
>>>>> I think this is right up to our valley.
>>>>> 
>>>>> Http://github.com/Cloudera/impala
>>>>> 
>>>>> It's licensed to Cloudera only.
>>>>> 
>>>>> Supports llvm ir and looks like its planning to support all different
>>>>> formats like we do.
>>>>> 
>>>>> Tim
>>>>> 
>>>>> Sent from my iPhone
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Swarnim
>>>> 
>>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message