lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5611) Simplify the default indexing chain
Date Mon, 28 Apr 2014 15:02:16 GMT


Robert Muir commented on LUCENE-5611:

In StoredFieldsWriter:

- *   <li>For every document, {@link #startDocument(int)} is called,
+ *   <li>For every document, {@link #startDocument()} is called,
  *       informing the Codec how many fields will be written.

This javadoc "compiles" but now does not make sense because we don't pass numFields as a parameter

The attribute handling in the indexing chain got more confusing and complicated. Can we factor
this into FieldInvertState?

Its bogus we call hasAttribute + getAttribute, besides making the code more complicated, its
two hashmap lookups for 2 atts. We should add a method to attribute source that acts like
map.get (returns an attribute, or null if it doesnt exist). Or simple change the semantics
of getAttribute to do that. This can be a followup issue.

I will keep reviewing, i only got thru the first 3 or 4 files in the patch.

> Simplify the default indexing chain
> -----------------------------------
>                 Key: LUCENE-5611
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.9, 5.0
>         Attachments: LUCENE-5611.patch, LUCENE-5611.patch
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message