lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Lucene Compression
Date Wed, 02 Apr 2008 12:09:37 GMT
It's generally considered best practice to compress things first in  
your app and then add them as a binary field.   That being said, I  
don't see why that would blow up on it's own.  Have you tried  
compressing it outside of Lucene to see what happens?  If you can  
reproduce it as a test case for Lucene, that would be great.

 From FieldsWriter, Lucene's compression code looks like:
private final byte[] compress (byte[] input) {

       // Create the compressor with highest level of compression
       Deflater compressor = new Deflater();

       // Give the compressor the data to compress

        * Create an expandable byte array to hold the compressed data.
        * You cannot use an array that's the same size as the orginal  
        * there is no guarantee that the compressed data will be  
smaller than
        * the uncompressed data.
       ByteArrayOutputStream bos = new  

       // Compress the data
       byte[] buf = new byte[1024];
       while (!compressor.finished()) {
         int count = compressor.deflate(buf);
         bos.write(buf, 0, count);


       // Get the compressed data
       return bos.toByteArray();

There is an interesting comment in that code about how the compressed  
data won't necessarily be smaller, so maybe you have entered the  
compression twilight zone.


On Apr 2, 2008, at 12:51 AM, Sebastin wrote:

> Hi All,
>       is there any possibility to create compression store for the
> following types of string in lucene index store?
> String str = "II0264.D05|00022745|ABCDE|03/01/2008 00:23:12|00035|
> 9840836588| 129382152520| 04F4243B600408|04F4243B600408|
> |11919898456123|354943011025810L| "CPTBS2I"| "ABCD3E"|11| 
> 1234510003243219I|"
> I try to store these fields as Field.Store.COMPRESSION  but it  
> exceeds the
> original size of the file?
> -- 
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message