jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Hoh (JIRA) <j...@apache.org>
Subject [jira] [Created] (OAK-7819) Improve logging for indexing progress
Date Thu, 11 Oct 2018 08:28:00 GMT
Jörg Hoh created OAK-7819:
-----------------------------

             Summary: Improve logging for indexing progress
                 Key: OAK-7819
                 URL: https://issues.apache.org/jira/browse/OAK-7819
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
    Affects Versions: 1.8.2
            Reporter: Jörg Hoh


At the moment I am trying to understand how I can improve the indexing performance of my RDB-based
Oak setup.

Currently the indexing progress is logged like this:
{noformat}
10.10.2018 13:00:04.077 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing will be performed for following indexes: [/oak:index/nodetype]
10.10.2018 13:00:15.911 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing Traversed #10000 <path> [666,60 nodes/s, 2399760,00 nodes/hr]
10.10.2018 13:00:21.792 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing Traversed #20000 <path> [999,95 nodes/s, 3599820,00 nodes/hr]
10.10.2018 13:00:27.211 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing Traversed #30000 <path> [1153,81 nodes/s, 4153707,69 nodes/hr]
10.10.2018 13:00:31.581 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing Traversed #40000 <path> [1333,30 nodes/s, 4799880,00 nodes/hr]
...
10.10.2018 13:13:44.585 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing Traversed #580000 <path> [704,74 nodes/s, 2537055,16 nodes/hr]
10.10.2018 13:14:04.738 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate
Reindexing Traversed #590000 <path> [699,88 nodes/s, 2519568,68 nodes/hr]
...
{noformat}

But it isn't clear to me how much of the time is spent on 
* fetching the nodes to be indexed from the repo (in our case residing in the RDB)
* the actual indexing computation
* the time to store extracted index data

having a more detailed logging of these individual aspects could shed some more light on the
bottlenecks of this process.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message