trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <dave.birds...@esgyn.com>
Subject RE: Index Type
Date Thu, 06 Aug 2015 15:58:48 GMT
Hi Layla,

I'm not sure what you're asking when you ask about "Index Type". The
reference to B-trees makes me think you're asking about the physical
implementation of the indexes.

Secondary indexes in Trafodion are just HBase tables, no different from
Trafodion tables, and therefore have physical characteristics similar to
base tables. That is, they are log-structured merge files, just like the
base tables.

When the Optimizer chooses to use an index, it may be doing so for a couple
of reasons. It might, for example, discover that an index has all the
columns referenced in a query, but in a more pleasing order (for example,
perhaps the rows can be accessed directly, while in the base table a full
scan might be required). This would result in an index-only scan.
Alternatively, it might discover that an index doesn't contain all the
columns, but does offer direct access given the predicates which again would
require a larger scan on the base table. If this is the case, the Optimizer
will consider joining the index to the base table, and compare the cost of
the join plan against a base table scan.

In recent days, some tweaks and optimizations have been added to indexes,
which you probably don't have in your test bed yet.

For example, a recent check-in allows indexes to be in "aligned row format"
independent of the base table. "Aligned row format" is a format where all
the Trafodion columns for a given row are packaged in a single cell from an
HBase perspective. It is more efficient when a workload doesn't have
updates. And in a given application, what are updates to a base table might
become deletes and inserts into an index (for example, if it is
predominately the index columns that are updated). So it might be more
optimal from a path length perspective to use a different physical row
format for an index than a base table.

Down the road I would not be surprised if we make further such tweaks.

Dave



-----Original Message-----
From: Martin, Layla (HP DualStudy) [mailto:layla.martin@hp.com]
Sent: Thursday, August 6, 2015 6:27 AM
To: dev@trafodion.incubator.apache.org
Subject: RE: Index Type

Hi QiFan,

Thanks for your help!

Can you tell me which index type Trafodion uses? (This is important for me,
as I'm supposed to write a paper for university about this)

It always uses intervals = frequency.

Thanks
Layla

Here is the output for f=1,5,50:
freqency = 1
>>showstats for table testdata3 on s_key detail;

Detailed Histogram data for Table TRAFODION.SEABASE.TESTDATA3 Table ID:
94086134586213909

Hist ID:    1838554956
Column(s):  S_KEY
Total Rows: 50000
Total UEC:  1
Low Value:  (5)
High Value: (5)
Intervals:  1

Number    Rowcount         UEC Boundary
====== =========== =========== ======================================
     0           0           0 (5)
     1       50000           1 (5)


--- SQL operation complete.


frequency = 5
>>showstats for table testdata3 on s_key detail;

Detailed Histogram data for Table TRAFODION.SEABASE.TESTDATA3 Table ID:
134055579012223698

Hist ID:    647527840
Column(s):  S_KEY
Total Rows: 50000
Total UEC:  5
Low Value:  (0)
High Value: (4)
Intervals:  5

Number    Rowcount         UEC Boundary
====== =========== =========== ======================================
     0           0           0 (0)
     1        9812           1 (0)
     2       10092           1 (1)
     3       10029           1 (2)
     4        9981           1 (3)
     5       10086           1 (4)


--- SQL operation complete.


frequency=50
>>showstats for table testdata3 on s_key detail;

Detailed Histogram data for Table TRAFODION.SEABASE.TESTDATA3 Table ID:
134055579012309892

Hist ID:    1526840203
Column(s):  S_KEY
Total Rows: 50000
Total UEC:  50
Low Value:  (0)
High Value: (49)
Intervals:  50

Number    Rowcount         UEC Boundary
====== =========== =========== ======================================
     0           0           0 (0)
     1        1041           1 (0)
     2         991           1 (1)
     3        1010           1 (2)
     4         963           1 (3)
     5         976           1 (4)
     6         969           1 (5)
     7         981           1 (6)
     8        1050           1 (7)
     9         986           1 (8)
    10        1057           1 (9)
    11        1033           1 (10)
    12        1016           1 (11)
    13         990           1 (12)
    14         980           1 (13)
    15        1021           1 (14)
    16        1009           1 (15)
    17        1004           1 (16)
    18         966           1 (17)
    19         998           1 (18)
    20         979           1 (19)
    21        1010           1 (20)
    22        1013           1 (21)
    23         974           1 (22)
    24        1023           1 (23)
    25         983           1 (24)
    26        1051           1 (25)
    27        1027           1 (26)
    28        1054           1 (27)
    29         991           1 (28)
    30         993           1 (29)
    31         993           1 (30)
    32         991           1 (31)
    33         931           1 (32)
    34         942           1 (33)
    35         995           1 (34)
    36         953           1 (35)
    37         945           1 (36)
    38        1015           1 (37)
    39        1019           1 (38)
    40         956           1 (39)
    41         963           1 (40)
    42        1004           1 (41)
    43        1017           1 (42)
    44        1047           1 (43)
    45        1021           1 (44)
    46        1019           1 (45)
    47        1013           1 (46)
    48         986           1 (47)
    49        1056           1 (48)
    50         995           1 (49)


--- SQL operation complete.

Layla Martin
HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com

-----Original Message-----
From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Donnerstag, 6. August 2015 14:30
To: dev
Subject: Re: Index Type

Hi Layla,

Glad to hear that you were able to see index plans for some of your data.

Can you do a showstats for the index column and find out the interval(s) in
which values with frequency of 1 and 5?

The compiler relies on the stats to figure out the cardinality and it is
possible these extreme low frequency values are buried in some of the
intervals and their low frequencies are not known spcifically.  If that is
the case,  we may need to increase # of intervals (say to 100 or even
higher) so that the intervals correctly represent the low frequency values.
The default is 50 to reduce the memory usage.

BTW, the compiler does recognize an opposite case where some value has very
high frequency (say 10% of the total rows) and applies special joins (skew
buster) when necessary.

showstats for table <T> on <column> detail;

update statistics for table <T> on <column> generate <n> intervals;


Thanks --Qifan

On Thu, Aug 6, 2015 at 2:25 AM, Martin, Layla (HP DualStudy) <
layla.martin@hp.com> wrote:

> Hi,
>
>
>
> Sorry for just another question …
>
>
>
> I got another question regarding Trafodion indexes. Which type of
> index do you use?
>
>
>
> I conducted a test with 50000 table rows and a varying frequency
> (number of different entries in the indexed column).
>
>
>
> This test has shown the best results (best benefit over not using an
> index) for frequencies 100 and 1000 (I’ve tested 1, 5, 50, 100, 1000,
> 10000 and 50000, the optimizer didn’t use the index for 1 and 5).
>
> I thought that B-Trees work best with a frequency close to log(number
> of rows), but the results have been best close to sqrt(number of rows).
>
>
>
> Do you have an explanation for that?
>
> I didn’t find any information on that in the docu/wiki or on Git.
>
>
>
> Best regards
>
> Layla
>
>
>
> *Layla Martin*
> HP DualStudy
>
> Hewlett-Packard GmbH
>
> Telefon +49 7031 4504682
>
> *layla.martin@hp.com <layla.martin@hp.com>*
>
> Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy
>
> [image: Description: Description:
> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogo
> Small.png]
>
> Follow us on: [image: facebook] <https://www.facebook.com/DualStudy.hp>
> [image:
> twitter] <http://twitter.com/hpdualstudy> [image: youtube]
> <http://www.youtube.com/user/hpDualStudy>
>
> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard
> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst
> Reichart Vorsitzender des Aufsichtsrats: Jörg Menno Harms Sitz der
> Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
> WEEE-Reg.-Nr. DE 30409072
>
>
>
>
>



--
Regards, --Qifan

Mime
View raw message