phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1296) Scan entire region when tenant-specific table is analyzed
Date Fri, 26 Sep 2014 12:28:33 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149090#comment-14149090
] 

ramkrishna.s.vasudevan commented on PHOENIX-1296:
-------------------------------------------------

Anyway I have updated the following patch which makes guideposts to make it work with multi
tenant scenario.

->Previously the tenenantID was not passed correctly for clearing the cache on ANALYZE
'table' is done.
->Inorder to make the entry to get updated we add an entry in the SYSTEM.CATALOG for the
table with a ts +1 for the EMPTY_COLUMN (_0).
But in case of multitenant table this is not working correctly
{code}
[ZZTop\x00\x00TENANT_TABLE/0:COLUMN_COUNT/300/Put/vlen=4/mvcc=7, ZZTop\x00\x00TENANT_TABLE/0:DISABLE_WAL/300/Put/vlen=1/mvcc=7,
ZZTop\x00\x00TENANT_TABLE/0:IMMUTABLE_ROWS/300/Put/vlen=1/mvcc=7, ZZTop\x00\x00TENANT_TABLE/0:MULTI_TENANT/300/Put/vlen=1/mvcc=7,
ZZTop\x00\x00TENANT_TABLE/0:TABLE_SEQ_NUM/300/Put/vlen=8/mvcc=7, ZZTop\x00\x00TENANT_TABLE/0:TABLE_TYPE/300/Put/vlen=1/mvcc=7,
ZZTop\x00\x00TENANT_TABLE/0:VIEW_STATEMENT/300/Put/vlen=57/mvcc=7, ZZTop\x00\x00TENANT_TABLE/0:VIEW_TYPE/300/Put/vlen=1/mvcc=7,
ZZTop\x00\x00TENANT_TABLE/0:_0/301/Put/vlen=0/mvcc=10]
{code}

{code}
0:DEFAULT_COLUMN_FAMILY/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:DISABLE_WAL/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0,
/0:IMMUTABLE_ROWS/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:INDEX_DISABLE_TIMESTAMP/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0,
/0:INDEX_STATE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:INDEX_TYPE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0,
/0:MULTI_TENANT/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:PK_NAME/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0,
/0:SALT_BUCKETS/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:TABLE_SEQ_NUM/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0,
/0:TABLE_TYPE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:VIEW_INDEX_ID/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0,
/0:VIEW_STATEMENT/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, /0:VIEW_TYPE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0]
{code}

So we skip the last cell in the result (_0) in this loop
{code}
        while (i < results.size() && j < TABLE_KV_COLUMNS.size()) {
            Cell kv = results.get(i);
            Cell searchKv = TABLE_KV_COLUMNS.get(j);
            int cmp =
                    Bytes.compareTo(kv.getQualifierArray(), kv.getQualifierOffset(),
                        kv.getQualifierLength(), searchKv.getQualifierArray(),
                        searchKv.getQualifierOffset(), searchKv.getQualifierLength());
            if (cmp == 0) {
                timeStamp = Math.max(timeStamp, kv.getTimestamp()); // Find max timestamp
of table
                                                                    // header row
                tableKeyValues[j++] = kv;
                i++;
            } else if (cmp > 0) {
                timeStamp = Math.max(timeStamp, kv.getTimestamp()); 
                tableKeyValues[j++] = null;
            } else {
                i++; // shouldn't happen - means unexpected KV in system table header row
            }
        }
{code}

So in order to use that entry I have just added a check like this
{code}
        while (i < results.size()) {
            Cell kv = results.get(i);
            if (Bytes.compareTo(kv.getQualifierArray(), kv.getQualifierOffset(), kv.getQualifierLength(),
                    QueryConstants.EMPTY_COLUMN_BYTES, 0, QueryConstants.EMPTY_COLUMN_BYTES.length)
== 0) {
                keyValue = kv;
                break;
            }
            i++;
        }
{code}


> Scan entire region when tenant-specific table is analyzed
> ---------------------------------------------------------
>
>                 Key: PHOENIX-1296
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1296
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>
> Based on the issue you've uncovered (that stats must be updated completely for a region),
there's a bit of follow on work needed if an ANALYZE is done on a tenant-specific table. This
case will be optimized to only scan and analyze the current tenant's data, however we have
to make sure that the entire region(s) containing that tenant's data is scanned (or we'll
end up replacing the stats for that region with just the one we calculated for that tenant).
> We should be able to do that based on ScanUtil.isAnalyzeTable(scan) being true in DefaultParallelIteratorRegionSplitter
and/or ParallelIterators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message