asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Metadata changes
Date Tue, 15 Dec 2015 01:15:28 GMT
Can you briefly explain why option 3 is so heavy? (Remind us how the use 
info is modeled?)

On 12/14/15 3:43 PM, Steven Jacobs wrote:
> We just had a UCR discussion on this topic. The issue is really with the
> third "index" here. The code now is using one "index" to go in two
> directions:
> 1) To find datatypes that use datatype A
> 2) To find datatypes that are used by datatype A.
>
> The way that it works now is hacked together, but designed for performance.
> So we have three choices here:
>
> 1) Stick to the status quo, and leave the "indexes" as they are
> 2) Remove the Metadata secondary indexes, which will eliminate the hack but
> cost some performance on Metadata
> 3) Implement the Metadata secondary indexes correctly as Asterix indexes.
> For this solution to work with our dataset designs, we will need to have
> the ability to index homogeneous lists. In addition, we will have reverse
> compatibility issues unless we plan things out for the transition.
>
> What are the thoughts?
>
>
> Orthogonally, it seems that the consensus for storing the datatype
> dataverse in the dataset Metadata is to just add it as an open field at
> least for now. Is that correct?
>
> Steven
>
>
> On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <dtabass@gmail.com> wrote:
>
>> Thoughts inlined:
>>
>> On 12/14/15 11:12 AM, Steven Jacobs wrote:
>>
>>> Here are the conclusions that Ildar and I have drawn from looking at the
>>> secondary indexes:
>>>
>>> First of all it seems that datasets are local to node groups, but
>>> dataverses can span node groups, which seems a little odd to me.
>>>
>> Node groups are an undocumented but to-be-exploited-someday feature that
>> allows datasets to be stored on less than all nodes in a given cluster.  As
>> we face bigger clusters, we'll want to open up that possibility.  We will
>> hopefully use them inside w/o having to make users manage them manually
>> like parallel DB2 did/does.  Dataverses are really just a namespace thing,
>> not a storage thing at all, so they are orthogonal to (and unrelated to)
>> node groups.
>>
>>> There are three Metadata secondary indexes:  GROUPNAME_ON_DATASET_INDEX,
>>> DATATYPENAME_ON_DATASET_INDEX, DATATYPENAME_ON_DATATYPE_INDEX
>>>
>>> The first is used in only one case:
>>> When dropping a node group, check if there are any datasets using this
>>> node
>>> group. If so, don't allow the drop
>>> BUT, this index has a field called "dataverse" which is not used at all.
>>>
>> This one seems like a waste of space since we do this almost never. (Not
>> much space, but unnecessary.)  If we keep it it should become a proper
>> index.
>>
>>> The second is used when dropping a datatype. If there is a dataset using
>>> this datatype, don't allow the drop.
>>> Similarly, this index has a "dataverse" which is never used.
>>>
>> You're about to use the dataverse part, right?  :-)  This index seems like
>> it will be useful but should be a proper index.
>>
>>> The third index is used to go in two cases, using two different ideas of
>>> "keys"
>>> It seems like this should actually be two different indexes.
>>>
>> I don't think I understood this comment....
>>
>>
>>> This is my understanding so far. It would be good to discuss what the
>>> "correct" version should be.
>>> Steven
>>>
>>>
>>>
>>>
>>> On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>>
>>> Hi all,
>>>> I'm implementing a change so that datasets can use datatypes from
>>>> alternate data verses (previously the type and set had to be from the
>>>> same
>>>> dataverse). Unfortunately this means another change for Dataset Metadata
>>>> (which will now store the dataverse for its type).
>>>>
>>>> As such, I had a couple of questions:
>>>>
>>>> 1) Should this change be thrown into the release branch, as it is another
>>>> Metadata change?
>>>>
>>>> 2) In implementing this change, I've been looking at the Metadata
>>>> secondary indexes. I had a discussion with Ildar, and it seems the thread
>>>> on Metadata secondary indexes being "hacked" has been lost. Is this also
>>>> something that should get into the release? Is there anyone currently
>>>> looking at it?
>>>>
>>>> Steven
>>>>
>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message