bloodhound-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <>
Subject Re: [Proposal] Core Bloodhound - basic concepts
Date Mon, 12 Mar 2018 02:00:23 GMT
On Sun, Mar 11, 2018 at 4:21 PM, Gary <> wrote:

> How tickets are stored as a whole is also worth tackling, which is what #2
> is trying to broadly decide. Here I am suggesting that the state of a
> ticket can be built up from a query that collects all the change events and
> applies the deltas in order. This could prove to be a slow process so at
> some point we may want to look at keeping a record of the current state or
> a checkpoint but, given that ticket views are expected to show the history,
> these are details that are required anyway. I suggest that we can look at
> optimising for speed later. Knowing that we can update a ticket from deltas
> may be useful.

The Apache Subversion team has a LOT of experience in this kind of storage

When we started, we stored the "latest" in full text, and then had a series
of "reverse deltas" to go backwards in time. We switched that around, and
store the original in full text, and then apply "forward" deltas to reach

We have two mechanisms to speed this up: a sophisticated cache system.
Invariably, "latest" will be cached and fast to retrieve. The more
important part of our "series of delta" system, allowing us to rapidly
construct any point in history is to use "skip deltas". This is analogous
to the "skip list"[1] concept, where we assemble N deltas into a single
delta allowing us to skip over many deltas with a single application.
Generally speaking, if there are M deltas in a file's entire history, then
we can reassemble any point in history by applying log(M) deltas.

(note we used the skip delta mechanism for both directions; it is effective
in both directions)

Finally, I am suggesting in #3 that as much as possible we generalise
> ticket categorisation. Categorisation is a central concept to capture and,
> in trac we inherited categorisations such as statuses (open, in progress,
> etc), types (bug, enhancement, task, etc), milestones, versions, etc, and
> these had separate implementations.

When my team designed the issue tracker for Google Code's project hosting,
we did the same thing. Most issue trackers are highly-structured with a
field or this and that. It complicates issue creation, issue management,
querying, etc. I recall sitting down with 20 fields from a typical Bugzilla
install, and reducing it to about 8, where we used labels for "everything".
We did not bother with "issue types".

A second thing that we did is allow labels such as "milestone-14", and then
enable our "list display" to be configured for a "milestone" column that
would extract any labels that started with "milestone-", showing just the

It made for a very robust, easy to understand, and flexible metadata system
for each issue.

You can still see some of these design choices a decade later in Monorail,
a descendent of our tracker, now being used by the Chromium project.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message