drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi OZAWA <ozawa.tsuyo...@gmail.com>
Subject Re: Storage file format
Date Wed, 19 Sep 2012 06:26:12 GMT
Sure :-) I'll create the ticket to Drill jira.

Thanks,
Tsuyoshi

On Sun, Sep 16, 2012 at 6:11 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> There is no project-wide roadmap in a real open source project.
>
> There are vision documents that various people use to try to motivate
> consensus.
>
> There are also individual roadmaps that describe what the individual
> contributors plan to do.
>
> Power Drill style in memory data is definitely intriguing and once Drill
> works and works fast on simpler structures, I would expect that somebody
> would be interested in implementing it.
>
> Perhaps that would be you?
>
> On Sat, Sep 15, 2012 at 10:16 AM, Tsuyoshi OZAWA
> <ozawa.tsuyoshi@gmail.com>wrote:
>
>> Hello,
>>
>> Is there a roadmap to suppor in-memory index and storage like
>> PowerDrill? It's one kind of storage, though its format is different
>> from the columnar storage format in Dremel paper as you mentioned.
>>
>> IMO, the in-memory index and storage are much useful for analysis with
>> small cluster.
>>
>> Thanks,
>> - Tsuyoshi
>>
>> On Sun, Sep 16, 2012 at 2:02 AM, Dharm Raj <dharmrajbaliyan@gmail.com>
>> wrote:
>> > You are right Camuel. While thinking  storage format I was thinking about
>> > append. Misplaced update.
>> >
>> > On Sat, Sep 15, 2012 at 9:49 PM, Camuel Gilyadov <camuel@gmail.com>
>> wrote:
>> >
>> >> Drill doesn't support updates. It is append only data store and append
>> is
>> >> usually expected to be a nice data chunk not a single row
>> >>
>> >> On Sat, Sep 15, 2012 at 8:09 AM, Dharm Raj <dharmrajbaliyan@gmail.com
>> >> >wrote:
>> >>
>> >> > For columnar storage, IMO each column can be managed in a separate
>> file.
>> >> > Dremel also seems to have each column in a separate file. This should
>> be
>> >> > easy to manage and update are possible. Please see
>> >> > https://issues.apache.org/jira/browse/AVRO-806
>> >> >
>> >> > Drill architecture slides shows AVRO-806 and trevni in Column storage
>> >> box.
>> >> > Are we looking them as candidate for storage format for drill?
>> >> >
>> >> > If we have lot of data with high amount of sparsity and major use
>> case is
>> >> > to read only once data is written - Another way could be to store in
a
>> >> > column major sparse matrix format. It  looks easy to implement but
>> >> updates
>> >> > may be problematic. just a thought.
>> >> >
>> >> > Regards,
>> >> > Dharm
>> >> >
>> >> > On Sat, Sep 15, 2012 at 7:24 PM, NAVEEN MAANJU <
>> >> > naveen.maanju.apache@gmail.com> wrote:
>> >> >
>> >> > > make sense..
>> >> > >
>> >> > > On Sat, Sep 15, 2012 at 6:44 AM, Ted Dunning <ted.dunning@gmail.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > The key goal here is to get something simple working quickly
in a
>> way
>> >> > > that
>> >> > > > allows additional, more advanced implementations.
>> >> > > >
>> >> > > > On Sat, Sep 15, 2012 at 5:47 AM, moon soo Lee <
>> leemoonsoo@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > for column-storage, how about leverage Hbase or Accumulo?
>> >> > > > >
>> >> > > > > they'll also give a chance to data update (future work?)
>> >> > > > >
>> >> > > > >
>> >> > > > > On Sat, Sep 15, 2012 at 9:30 PM, Azuryy Yu <azuryyyu@gmail.com>
>> >> > wrote:
>> >> > > > >
>> >> > > > > > Hi All,
>> >> > > > > >
>> >> > > > > > I am interested in working on storage format. (sign
up?)
>> >> > > > > >
>> >> > > > > > I wrote a HDFS  file format, which is similar to
Sequence file
>> >> (row
>> >> > > > > > storage, block management, compress), I provide
InputFormat
>> and
>> >> > > > > > OutputFormat,
>> >> > > > > >
>> >> > > > > > sometimes it get a great performance, sometimes
not, depends
>> on
>> >> the
>> >> > > > data.
>> >> > > > > >
>> >> > > > > > for Drill, we should implement a column-storage,
this can skip
>> >> some
>> >> > > > > columns
>> >> > > > > > during query, and skip some rows within one column
file. but
>> this
>> >> > > > > > column-storage should based on the distributed
file system,
>> such
>> >> as
>> >> > > > HDFS,
>> >> > > > > > Mapr DFS, I like Mapr DFS because of HA.
>> >> > > > > >
>> >> > > > > > we can implement the following column storage file
format, I
>> >> think
>> >> > > it's
>> >> > > > > > enough to us.
>> >> > > > > >
>> >> > > > > > http://arxiv.org/pdf/1105.4252.pdf
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>>



-- 
OZAWA Tsuyoshi

Mime
View raw message