lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1292) Tag Index
Date Thu, 22 May 2008 22:31:55 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599224#action_12599224
] 

Jason Rutherglen commented on LUCENE-1292:
------------------------------------------

Tag in the sense you described.  In Lucene, an untokenized field.  Yes, a
separate index that is dynamic while the main one is static except deletes.
The main problem, though not impossible, is syncing the segments.  The tag
index needs the new segment docmap and a callback for each merge in
SegmentMerger.  The write to the tag index needs the doc id for each
IndexWriter.addDocument call and a callback for each subsequent flush to
disk.

These event listeners can be used for other customized indexes.


On Thu, May 22, 2008 at 6:09 PM, Otis Gospodnetic (JIRA) <jira@apache.org>



> Tag Index
> ---------
>
>                 Key: LUCENE-1292
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1292
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>
> The problem the tag index solves is slow field cache loading and range queries, and reindexing
an entire document to update fields that are not tokenized.  
> The tag index holds untokenized terms with a docfreq of 1 in a term dictionary like index
file.  The file also stores the docs per term, similar to LUCENE-1278.  The index also has
a transaction log and in memory index for realtime updates to the tags.  The transaction log
is periodically merged into the existing tag term dictionary index file.
> The TagIndexReader extends IndexReader and is unified with a regular index by ParallelReader.
 There is a doc id to terms skip pointer file for the IndexReader.document method.  This file
contains a pointer for looking up the terms for a document.  
> There is a higher level class that encapsulates writing a document with tag fields to
IndexWriter and TagIndexWriter.  This requires a hook into IndexWriter to coordinate doc ids
and flushing segments to disk.  
> The writer class could be as simple as:
> {code}
> public class TagIndexWriter {
>   
>   public void add(Term term, DocIdSetIterator iterator) {
>   }
>   
>   public void delete(Term term, DocIdSetIterator iterator) {
>   }
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message