lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5123) invert the codec postings API
Date Sun, 08 Sep 2013 22:40:52 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761540#comment-13761540
] 

Michael McCandless commented on LUCENE-5123:
--------------------------------------------

{quote}
1. move write() from PostingsFormat to FieldsConsumer
2. make the "push" api a subclass of FieldsConsumer that has a final implementation of write()
and exposes the abstract api it has today (e.g. addField)
{quote}

I started down this path (moved the write method to FieldsConsumer, and created a PushFieldsConsumer
subclass that impls final write, exposing the current API) but ... this causes problems for
wrapping/delegating PostingsConsumers (e.g. AssertingPF, BloomPF, PulsingPF) since suddenly
they must be strongly typed to accept only PushFieldsConsumer.  Either that or I guess we
could cut each of these over to write().

I mean, it exposes a real issue w/ the current patch: you cannot wrap SimpleTextPF (or any
future PF that uses the pull API) inside these PFs that use the push API.  Not sure what to
do ...


                
> invert the codec postings API
> -----------------------------
>
>                 Key: LUCENE-5123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5123
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch
>
>
> Currently FieldsConsumer/PostingsConsumer/etc is a "push" oriented api, e.g. FreqProxTermsWriter
streams the postings at flush, and the default merge() takes the incoming codec api and filters
out deleted docs and "pushes" via same api (but that can be overridden).
> It could be cleaner if we allowed for a "pull" model instead (like DocValues). For example,
maybe FreqProxTermsWriter could expose a Terms of itself and just passed this to the codec
consumer.
> This would give the codec more flexibility to e.g. do multiple passes if it wanted to
do things like encode high-frequency terms more efficiently with a bitset-like encoding or
other things...
> A codec can try to do things like this to some extent today, but its very difficult (look
at buffering in Pulsing). We made this change with DV and it made a lot of interesting optimizations
easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message