In the regards to comparison: How does it compare to Druid which is also an in-memory warehouse
? Does Drill support joins to in memory dimension tables unlike Druid? Does it have any limitation
on the number of records it can fetch, etc?
Regards,
Amit
> On May 16, 2014, at 8:46 PM, Jason Altekruse <altekrusejason@gmail.com> wrote:
>
> Ted covered the most important points. I just want to add a few
> clarifications.
>
> While the code for Drill so far is written in pure Java, there is not
> specific requirement that all of Drill run in Java. Part of the motivation
> for using the in-memory representation of records that we did, making it
> columnar, and also storing it in java native ByteBuffers, was to enable
> integration with native code compiled from C/C++ to run some of our
> operators. ByteBuffers are part of the official Java API, but their use is
> not recommend. They allow memory operations that you do not find in typical
> java data types and structures, but require you to manage your own memory.
>
> One important use case for us is the ability to pass them through the Java
> Native Interface without having to do a copy. While it is still inefficient
> to jump from Java to C every record, we should be able to define a clean
> interface to take a batch of records (around 1000) in a single jump to a C
> context and after the C code finishes processing them, a single jump back
> into the java context will also be able to complete quickly in the same
> manner as the jump in the other direction.
>
> With this consideration, any language you could pass data to from C would
> be compatible. While we likely will not support a wide array of plugin
> languages soon, it should be possible for people to plug in a variety of
> existing codebases for adding data processing functionalities to Drill.
>
> -Jason Altekruse
>
>
>> On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>
>> Drill is a very different tool from spark or even from Spark SQL (aka
>> Shark).
>>
>> There is some overlap, but there are important differences. For instance,
>>
>> - Drill supports weakly typed SQL.
>>
>> - Drill has a very clever way to pass data from one processor to another.
>> This allows very efficient processing
>>
>> - Drill generates code in response to query and to observed data. This is
>> a big deal since it allows high speed with dynamic types
>>
>> - Drill supports full ANSII SQL, not Hive QL.
>>
>> - Spark supports programming in Scala
>>
>> - Spark ties distributed data object to objects in a language like Java or
>> Scala rather than using a columnar form. This makes generic user written
>> code easier, but is less efficient.
>>
>>
>>
>>
>> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi
>> <nvn_ravi@hotmail.com>wrote:
>>
>>> Hi,
>>>
>>> I started exploring Drill , it looks like very interesting tool. Can some
>>> body explain how Drill is going to compare with Apache Spark and Storm.
>>> Do we still need Apache Spark along with Drill in the Bigdata stack? Or
>>> Drill can directly support as replacement with Spark?
>>>
>>> Thanks,
>>> Ravi
>>
|