phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2890) Extend IndexTool to allow incremental index rebuilds
Date Thu, 08 Dec 2016 07:43:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731449#comment-15731449
] 

ASF GitHub Bot commented on PHOENIX-2890:
-----------------------------------------

Github user ankitsinghal commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/210#discussion_r91459367
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java
---
    @@ -167,50 +180,152 @@ private void printHelpAndExit(Options options, int exitCode) {
             formatter.printHelp("help", options);
             System.exit(exitCode);
         }
    +    
    +    class JobFactory {
    +        Connection connection;
    +        Configuration configuration;
    +        private Path outputPath;
     
    -    @Override
    -    public int run(String[] args) throws Exception {
    -        Connection connection = null;
    -        try {
    -            CommandLine cmdLine = null;
    -            try {
    -                cmdLine = parseOptions(args);
    -            } catch (IllegalStateException e) {
    -                printHelpAndExit(e.getMessage(), getOptions());
    +        public JobFactory(Connection connection, Configuration configuration, Path outputPath)
{
    +            this.connection = connection;
    +            this.configuration = configuration;
    +            this.outputPath = outputPath;
    +
    +        }
    +
    +        public Job getJob(String schemaName, String indexTable, String dataTable, boolean
useDirectApi) throws Exception {
    +            if (indexTable == null) {
    +                return configureJobForPartialBuild(schemaName, dataTable);
    +            } else {
    +                return configureJobForAysncIndex(schemaName, indexTable, dataTable, useDirectApi);
                 }
    -            final Configuration configuration = HBaseConfiguration.addHbaseResources(getConf());
    -            final String schemaName = cmdLine.getOptionValue(SCHEMA_NAME_OPTION.getOpt());
    -            final String dataTable = cmdLine.getOptionValue(DATA_TABLE_OPTION.getOpt());
    -            final String indexTable = cmdLine.getOptionValue(INDEX_TABLE_OPTION.getOpt());
    +        }
    +        
    +        private Job configureJobForPartialBuild(String schemaName, String dataTable)
throws Exception {
                 final String qDataTable = SchemaUtil.getQualifiedTableName(schemaName, dataTable);
    -            final String qIndexTable = SchemaUtil.getQualifiedTableName(schemaName, indexTable);
    -
    +            final PTable pdataTable = PhoenixRuntime.getTable(connection, qDataTable);
                 connection = ConnectionUtil.getInputConnection(configuration);
    -            if (!isValidIndexTable(connection, qDataTable, indexTable)) {
    -                throw new IllegalArgumentException(String.format(
    -                    " %s is not an index table for %s ", qIndexTable, qDataTable));
    +            long minDisableTimestamp = HConstants.LATEST_TIMESTAMP;
    +            PTable indexWithMinDisableTimestamp = null;
    +            
    +            //Get Indexes in building state, minDisabledTimestamp 
    +            List<String> disableIndexes = new ArrayList<String>();
    +            List<PTable> disabledPIndexes = new ArrayList<PTable>();
    +            for (PTable index : pdataTable.getIndexes()) {
    +                if (index.getIndexState().equals(PIndexState.BUILDING)) {
    +                    disableIndexes.add(index.getTableName().getString());
    +                    disabledPIndexes.add(index);
    +                    if (minDisableTimestamp > index.getIndexDisableTimestamp()) {
    +                        minDisableTimestamp = index.getIndexDisableTimestamp();
    +                        indexWithMinDisableTimestamp = index;
    +                    }
    +                }
    +            }
    +            
    +            if (indexWithMinDisableTimestamp == null) {
    +                throw new Exception("There is no index for a datatable to be rebuild:"
+ qDataTable);
                 }
    +            if (minDisableTimestamp == 0) {
    +                throw new Exception("It seems Index " + indexWithMinDisableTimestamp
    +                        + " has disable timestamp as 0 , please run IndexTool with IndexName
to build it first");
    +                // TODO probably we can initiate the job by ourself or can skip them
while making the list for partial build with a warning
    +            }
    +            
    +            long maxTimestamp = getMaxRebuildAsyncDate(schemaName, disableIndexes);
    +            
    +            //serialize index maintaienr in job conf with Base64 TODO: Need to find better
way to serialize them in conf.
    +            List<IndexMaintainer> maintainers = Lists.newArrayListWithExpectedSize(disabledPIndexes.size());
    +            for (PTable index : disabledPIndexes) {
    +                maintainers.add(index.getIndexMaintainer(pdataTable, connection.unwrap(PhoenixConnection.class)));
    +            }
    +            ImmutableBytesWritable indexMetaDataPtr = new ImmutableBytesWritable(ByteUtil.EMPTY_BYTE_ARRAY);
    +            IndexMaintainer.serializeAdditional(pdataTable, indexMetaDataPtr, disabledPIndexes,
connection.unwrap(PhoenixConnection.class));
    +            PhoenixConfigurationUtil.setIndexMaintainers(configuration, indexMetaDataPtr);
    +            
    +            //Prepare raw scan 
    +            Scan scan = IndexManagementUtil.newLocalStateScan(maintainers);
    +            scan.setTimeRange(minDisableTimestamp - 1, maxTimestamp);
    +            scan.setRaw(true);
    +            scan.setCacheBlocks(false);
    --- End diff --
    
    Yes @chrajeshbabu , I did a below test and found the performance of both the scans(with
or without cache) are same, file descriptor count during the test is almost same.
    
    // with cache on
    - create a new table
    - Insert 500k records
    - Flush the data
    - scan with cache on
    
    // with cache off
    - create a new table again
    - Insert 500k records
    - Flush the data
    - scan with cache off
    
    Do you want me to do some other tests to verify the GC impact and all? or is it fine for
now? as the cache is also kept off in org.apache.hadoop.hbase.mapreduce.Export by default.


> Extend IndexTool to allow incremental index rebuilds
> ----------------------------------------------------
>
>                 Key: PHOENIX-2890
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2890
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Ankit Singhal
>            Assignee: Ankit Singhal
>            Priority: Minor
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-2890.patch, PHOENIX-2890_wip.patch
>
>
> Currently , IndexTool is used for initial index rebuild but I think we should extend
it to be used for recovering index from last disabled timestamp too. 
> In general terms if we run IndexTool on already existing/new index, then it should follow
the same semantics as followed by background Index rebuilding thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message