lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1292) Tag Index
Date Wed, 21 May 2008 14:19:55 GMT


Jason Rutherglen commented on LUCENE-1292:

Looks like the hook into IndexWriter needs to get the doc id from DocumentsWriterThreadState
in DocumentsWriter.updateDocument(Document doc, Analyzer analyzer, Term delTerm).  

The flush segment hook looks like it needs a callback from DocumentsWriter.flush(boolean closeDocStore)

> Tag Index
> ---------
>                 Key: LUCENE-1292
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
> The problem the tag index solves is slow field cache loading and range queries, and reindexing
an entire document to update fields that are not tokenized.  
> The tag index holds untokenized terms with a docfreq of 1 in a term dictionary like index
file.  The file also stores the docs per term, similar to LUCENE-1278.  The index also has
a transaction log and in memory index for realtime updates to the tags.  The transaction log
is periodically merged into the existing tag term dictionary index file.
> The TagIndexReader extends IndexReader and is unified with a regular index by ParallelReader.
 There is a doc id to terms skip pointer file for the IndexReader.document method.  This file
contains a pointer for looking up the terms for a document.  
> There is a higher level class that encapsulates writing a document with tag fields to
IndexWriter and TagIndexWriter.  This requires a hook into IndexWriter to coordinate doc ids
and flushing segments to disk.  
> The writer class could be as simple as:
> {code}
> public class TagIndexWriter {
>   public void add(Term term, DocIdSetIterator iterator) {
>   }
>   public void delete(Term term, DocIdSetIterator iterator) {
>   }
> }
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message