phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2209) Building Local Index Asynchronously via IndexTool fails to populate index table
Date Tue, 07 Jun 2016 06:52:21 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317995#comment-15317995
] 

James Taylor commented on PHOENIX-2209:
---------------------------------------

We're making too many special cases for local indexes all throughout the code. We should be
able to keep this on the server-side. Why not just let the local indexes be populated through
the coprocessors on the server side? We can disallow the IndexTool to be used if there are
local indexes and it's generating HFiles if need be. Our real use case for the IndexTool is
to build indexes asynchronously using the regular front-door HBase APIs in which cases coprocessors
would fire.

> Building Local Index Asynchronously via IndexTool fails to populate index table
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2209
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2209
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.0
>         Environment: CDH: 5.4.4
> HBase: 1.0.0
> Phoenix: 4.5.0 (https://github.com/SiftScience/phoenix/tree/4.5-HBase-1.0) with hacks
for CDH compatibility. 
>            Reporter: Keren Gu
>            Assignee: Rajeshbabu Chintaguntla
>              Labels: IndexTool, LocalIndex, index
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2209.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Asynchronous Index population tool to create local index (of 1 column) on tables
with 10 columns, and 65M, 250M, 340M, and 1.3B rows respectively. 
> Table Schema as follows (with generic column names): 
> {quote}
> CREATE TABLE PH_SOJU_SHORT (
> id INT PRIMARY KEY,
> c2 VARCHAR NULL,
> c3 VARCHAR NULL,
> c4 VARCHAR NULL,
> c5 VARCHAR NULL,
> c6 VARCHAR NULL,
> c7 DOUBLE NULL,
> c8 VARCHAR NULL,
> c9 VARCHAR NULL,
> c10 BIGINT NULL
> )
> {quote}
> Example command used (for 65M row table): 
> {quote}
> 0: jdbc:phoenix:localhost> create local index LC_INDEX_SOJU_EVAL_FN on PH_SOJU_SHORT(C4)
async;
> {quote}
> And MR job started with command: 
> {quote}
> $ hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table PH_SOJU_SHORT --index-table
LC_INDEX_SOJU_EVAL_FN --output-path LC_INDEX_SOJU_EVAL_FN_HFILE
> {quote}
> The IndexTool MR jobs finished in 18min, 77min, 77min, and 2hr 34min respectively, but
all index tables where empty. 
> For the table with 65M rows, IndexTool had 12 mappers and reducers. MR Counters show
Map input and output records = 65M, Reduce Input and output records = 65M. PhoenixJobCounters
input and output records are all 65M. 
> IndexTool Reducer Log tail: 
> {quote}
> ...
> 2015-08-25 00:26:44,687 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last
merge-pass, with 32 segments left of total size: 22805636866 bytes
> 2015-08-25 00:26:44,693 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
File Output Committer Algorithm version is 1
> 2015-08-25 00:26:44,765 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> 2015-08-25 00:26:44,908 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
> 2015-08-25 00:26:45,060 INFO [main] org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:36:43,880 INFO [main] org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
Writer=hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/_temporary/attempt_1440094483400_5974_r_000000_0/0/496b926ad624438fa08626ac213d0f92,
wrote=10737418236
> 2015-08-25 00:36:45,967 INFO [main] org.apache.hadoop.hbase.io.hfile.CacheConfig: CacheConfig:disabled
> 2015-08-25 00:38:43,095 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1440094483400_5974_r_000000_0
is done. And is in the process of committing
> 2015-08-25 00:38:43,123 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1440094483400_5974_r_000000_0
is allowed to commit now
> 2015-08-25 00:38:43,132 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
Saved output of task 'attempt_1440094483400_5974_r_000000_0' to hdfs://nameservice/user/ubuntu/LC_INDEX_SOJU_EVAL_FN/_LOCAL_IDX_PH_SOJU_EVAL/_temporary/1/task_1440094483400_5974_r_000000
> 2015-08-25 00:38:43,158 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1440094483400_5974_r_000000_0'
done.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message