lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishwas Jain <vjvis...@gmail.com>
Subject Re: Compression algorithm for posting lists
Date Mon, 28 Mar 2016 12:02:28 GMT
Thanks for the reply and information.
              I have some doubts regarding the implemenation of lucene54
codec when writing the posting lists using the lucene50 postinglistwriter
while going through the code. What exactly does the finish() method in the
TermsWriter class of the BlockTreeTermsWriter.java file do? I have come to
undertstand that the posting lists(document ID, frequency, etc) is mainly
is mainly written using WriteBlock method in the ForUtil.java file...

Thanks..

On Mon, Mar 28, 2016 at 5:31 PM, Vishwas Jain <vjvishjn@gmail.com> wrote:

> Thanks for the reply and information.
>               I have some doubts regarding the implemenation of lucene54
> codec when writing the posting lists using the lucene50 postinglistwriter
> while going through the code. What exactly does the finish() method in the
> TermsWriter class of the BlockTreeTermsWriter.java file do? I have come to
> undertstand that the posting lists(document ID, frequency, etc) is mainly
> is mainly written using WriteBlock method in the ForUtil.java file...
>
> Thanks..
>
>
>
>
> On Mon, Mar 28, 2016 at 4:21 PM, Greg Bowyer <gbowyer@fastmail.co.uk>
> wrote:
>
>> The posting list is compressed using a specialised technique aimed at
>> pure numbers. Currently the codec uses a variant of Patched Frame of
>> Reference coding to perform this compression.
>>
>> A good survey of such techniques can be found in the good IR books
>> (https://mitpress.mit.edu/books/information-retrieval,
>>
>> http://www.amazon.com/Managing-Gigabytes-Compressing-Multimedia-Information/dp/1558605703
>> ,
>> http://nlp.stanford.edu/IR-book/) as well as this paper
>> http://eprints.gla.ac.uk/93572/1/93572.pdf.
>>
>> Interestingly, there are potentially some wins in finding better integer
>> codings (and one of my personal projects is aimed at doing exactly
>> this), but I doubt LZ4 compressing the posting list would help all that
>> much.
>>
>> Hope this helps
>>
>> On Mon, Mar 28, 2016, at 10:51 AM, Vishwas Jain wrote:
>> > Hello ,
>> >
>> >           We are trying to implement better compression techniques in
>> > lucene54 codec of Apache Lucene. Currently there is no such compression
>> > for
>> > posting lists in lucene54 codec but LZ4 compression technique is used
>> for
>> > stored fields. Does anyone know why there is no compression technique
>> for
>> > postings lists? and what are the possible compression that would benefit
>> > if
>> > implemented?
>> >
>> > Thanks
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message