asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <>
Subject Re: Metadata changes
Date Tue, 15 Dec 2015 01:51:49 GMT
Ah...  Indeed, got it.
It sure would be nice to have such indexes...  :-)
(In general they would be very useful.)

On 12/14/15 5:29 PM, Steven Jacobs wrote:
> There are two cases where the code is attempting to use indexes:
> 1) When deleting a type, find and delete the anonymous subtypes.
> 2) When deleting a type, confirm that it is not used as a nested type of
> another type.
> Ignoring the "indexes" that we have in Metadata, Datatype records have a
> field called "Fields" which contains a list of the fields within the type.
> Each value in this list has a "fieldname" and "fieldtype."
> For 1 we can simply iterate through this list and call delete recursively
> when "fieldtype" is not primitive and anonymous
> For 2 we need some way to find parent types given a type. The only way to
> do this quickly would be to create an index on Fields.fieldtype which is a
> field within a record within a list.
> Steven
> On Monday, December 14, 2015, Mike Carey <> wrote:
>> Can you briefly explain why option 3 is so heavy? (Remind us how the use
>> info is modeled?)
>> On 12/14/15 3:43 PM, Steven Jacobs wrote:
>>> We just had a UCR discussion on this topic. The issue is really with the
>>> third "index" here. The code now is using one "index" to go in two
>>> directions:
>>> 1) To find datatypes that use datatype A
>>> 2) To find datatypes that are used by datatype A.
>>> The way that it works now is hacked together, but designed for
>>> performance.
>>> So we have three choices here:
>>> 1) Stick to the status quo, and leave the "indexes" as they are
>>> 2) Remove the Metadata secondary indexes, which will eliminate the hack
>>> but
>>> cost some performance on Metadata
>>> 3) Implement the Metadata secondary indexes correctly as Asterix indexes.
>>> For this solution to work with our dataset designs, we will need to have
>>> the ability to index homogeneous lists. In addition, we will have reverse
>>> compatibility issues unless we plan things out for the transition.
>>> What are the thoughts?
>>> Orthogonally, it seems that the consensus for storing the datatype
>>> dataverse in the dataset Metadata is to just add it as an open field at
>>> least for now. Is that correct?
>>> Steven
>>> On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <> wrote:
>>> Thoughts inlined:
>>>> On 12/14/15 11:12 AM, Steven Jacobs wrote:
>>>> Here are the conclusions that Ildar and I have drawn from looking at the
>>>>> secondary indexes:
>>>>> First of all it seems that datasets are local to node groups, but
>>>>> dataverses can span node groups, which seems a little odd to me.
>>>>> Node groups are an undocumented but to-be-exploited-someday feature that
>>>> allows datasets to be stored on less than all nodes in a given cluster.
>>>> As
>>>> we face bigger clusters, we'll want to open up that possibility.  We will
>>>> hopefully use them inside w/o having to make users manage them manually
>>>> like parallel DB2 did/does.  Dataverses are really just a namespace
>>>> thing,
>>>> not a storage thing at all, so they are orthogonal to (and unrelated to)
>>>> node groups.
>>>> There are three Metadata secondary indexes:  GROUPNAME_ON_DATASET_INDEX,
>>>>> The first is used in only one case:
>>>>> When dropping a node group, check if there are any datasets using this
>>>>> node
>>>>> group. If so, don't allow the drop
>>>>> BUT, this index has a field called "dataverse" which is not used at all.
>>>>> This one seems like a waste of space since we do this almost never. (Not
>>>> much space, but unnecessary.)  If we keep it it should become a proper
>>>> index.
>>>> The second is used when dropping a datatype. If there is a dataset using
>>>>> this datatype, don't allow the drop.
>>>>> Similarly, this index has a "dataverse" which is never used.
>>>>> You're about to use the dataverse part, right?  :-)  This index seems
>>>> like
>>>> it will be useful but should be a proper index.
>>>> The third index is used to go in two cases, using two different ideas of
>>>>> "keys"
>>>>> It seems like this should actually be two different indexes.
>>>>> I don't think I understood this comment....
>>>> This is my understanding so far. It would be good to discuss what the
>>>>> "correct" version should be.
>>>>> Steven
>>>>> On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <>
>>>>> wrote:
>>>>> Hi all,
>>>>>> I'm implementing a change so that datasets can use datatypes from
>>>>>> alternate data verses (previously the type and set had to be from
>>>>>> same
>>>>>> dataverse). Unfortunately this means another change for Dataset
>>>>>> Metadata
>>>>>> (which will now store the dataverse for its type).
>>>>>> As such, I had a couple of questions:
>>>>>> 1) Should this change be thrown into the release branch, as it is
>>>>>> another
>>>>>> Metadata change?
>>>>>> 2) In implementing this change, I've been looking at the Metadata
>>>>>> secondary indexes. I had a discussion with Ildar, and it seems the
>>>>>> thread
>>>>>> on Metadata secondary indexes being "hacked" has been lost. Is this
>>>>>> also
>>>>>> something that should get into the release? Is there anyone currently
>>>>>> looking at it?
>>>>>> Steven

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message