jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-6584) Add tooling API
Date Fri, 01 Sep 2017 09:23:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150268#comment-16150268

Michael Dürig commented on OAK-6584:

While starting an initial implementation of the tooling API in Oak I had to make a couple
of [adjustments|https://github.com/mduerig/oak-tooling-api/commits/master] here and there.
Most notable I added support for the {{IOMonitor}} so tools can monitor segment read and write

I started to implement this on a [branch|https://github.com/mduerig/jackrabbit-oak/tree/OAK-6584]
in my Github fork. The way I intended this to work for now is that users of the API would
pass an instance of an {{FileStoreWrapper.Builder}} to {{FileStoreBuilder.withToolAccess()}}.
{{FileStoreBuilder.build}}  initialises the {{FileStoreWrapper.Builder}} with all required
information. Calling {{FileStoreWrapper.Builder.build()}} subsequently returns an implementation
{{Store}}. This implementation approach ensures that you explicitly need to setup your files
store for tool access. There is no way to get this access on an already instantiated file
store (e.g. a running production instance).

A couple of open questions remain:
* How should we deal with exceptions? Most notably the dreaded {{IOException}}. So far I would
prefer to not pollute the tooling API with throws clauses but wrap these into unchecked exceptions
(I'm using {{ISE}} for now). This is a compromise between usability of the API and runtime
stability. Since the target use case is tools I think it should be fine. 
* How do we deal with {{FileStoreWrapper.journalEntries()}}, which relies on an underlying
iterator that needs closing (see FIXME). Should we eagerly copy all entries? Should we return
a {{Journal implements Closeable}} instead of just an {{Iterable}} from that method? Should
we automatically close the underlying iterator once it is fully unspooled (what about the
rest)? Should we just ignore this case?
* All methods involving reading from segments and records directly are currently still unimplemented
(see FIXME). I didn't figure out a clean way to implement them yet (but didn't try hard though).
Maybe [~frm] can share some ideas here since you have been working mostly in this area lately.

> Add tooling API
> ---------------
>                 Key: OAK-6584
>                 URL: https://issues.apache.org/jira/browse/OAK-6584
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: tooling
>             Fix For: 1.8
> h3. Current situation
> Current segment store related tools are implemented ad-hoc by potentially relying on
internal implementation details of Oak Segment Tar. This makes those tools less useful, portable,
stable and potentially applicable than they should be.
> h3. Goal
> Provide a common and sufficiently stable Oak Tooling API for implementing segment store
related tools. The API should be independent of Oak and not available for normal production
use of Oak. Specifically it should not be possible to it to implement production features
and production features must not rely on it. It must be possible to implement the Oak Tooling
API in Oak 1.8 and it should be possible for Oak 1.6.
> h3. Typical use cases
> * Query the number of nodes / properties / values in a given path satisfying some criteria
> * Aggregate a certain value on queries like the above
> * Calculate size of the content / size on disk
> * Analyse changes. E.g. how many binaries bigger than a certain threshold were added
/ removed between two given revisions. What is the sum of their sizes?
> * Analyse locality: measure of locality of node states. Incident plots (See https://issues.apache.org/jira/browse/OAK-5655?focusedCommentId=15865973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15865973).
> * Analyse level of deduplication (e.g. of checkpoint) 
> h3. Validation
> Reimplement [Script Oak|https://github.com/mduerig/script-oak] on top of the tooling
> h3. API draft
> * Whiteboard shot of the [API entities|https://wiki.apache.org/jackrabbit/Oakathon%20August%202017?action=AttachFile&do=view&target=IMG_20170822_163256.jpg]
identified initially.
> * Further [drafting of the API|https://github.com/mduerig/oak-tooling-api] takes place
on Github for now. We'll move to the Apache SVN as soon as considered mature enough and have
a consensus of where to best move it. 

This message was sent by Atlassian JIRA

View raw message