jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-6381) Improved index analysis tools
Date Wed, 15 Nov 2017 11:09:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063301#comment-16063301
] 

Thomas Mueller edited comment on OAK-6381 at 11/15/17 11:08 AM:
----------------------------------------------------------------

svn.apache.org/r1799938 (trunk)
New method LuceneIndex.getFieldTermsInfo(path, field, max). 
* path is the index path (for example /oak:index/lucene), 
* field is the field name (for example ":path"), 
* max is the number of entries to list (for example 100)


was (Author: tmueller):
svn.apache.org/r1799938 (trunk)
New method LuceneIndex.getFieldTerms(path, field, max). 
* path is the index path (for example /oak:index/lucene), 
* field is the field name (for example ":path"), 
* max is the number of entries to list (for example 100)

> Improved index analysis tools
> -----------------------------
>
>                 Key: OAK-6381
>                 URL: https://issues.apache.org/jira/browse/OAK-6381
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.8
>
>
> It would be good to have more tools to analyze indexes:
> * For Lucene indexes, get a histogram of samples (terms). We have "getFieldInfo", which
shows which fields are how common, but we don't have terms. For example the /oak:index/lucene
index contains 1 million fulltext fields and node names for 1 million nodes, but I wonder
why, and what typical nodes names are, and maybe fulltext for most nodes is actually empty.
Maybe a new method "getTermHistogram(int sampleCount)" or similar
> * For property indexes, number of updated nodes per second or so. Right now we can just
analyze the counts per key, but some indexes / keys are very volatile (see many short lived
entries)
> * For Lucene indexes, writes per second or so (in MB).
> * How indexes are used (approximate read nodes / MB per hours)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message