lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetić <otis.gospodne...@gmail.com>
Subject Re: Delta of delta encoding
Date Tue, 25 Apr 2017 17:43:32 GMT
Hi,

On Tue, Apr 25, 2017 at 4:06 AM, Adrien Grand <jpountz@gmail.com> wrote:

> I think it makes sense indeed for time-series databases. The time field
> should grow by regular increments, and numerical values of consecutive
> documents are likely to be close to each other. Both cases should compress
> efficiently by doing delta of delta encoding.
>
> We haven't really started exploring leveraging the fact that doc values
> have an iterator API for compression at all. I think this delta-of-delta
> approach would be interesting to explore. Maybe we could encode values in
> blocks like postings and decide how to encode each block based on the
> actual data. Delta-of-delta would be one option, but sometimes we might
> also go with RLE or FOR depending on which one suits the actual data best.
>

Sounds great!  I created https://issues.apache.org/jira/browse/LUCENE-7806

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> Le mar. 25 avr. 2017 à 04:43, Otis Gospodnetić <otis.gospodnetic@gmail.com>
> a écrit :
>
>> Hi,
>>
>> I was reading about Facebook Beringei when I spotted this:
>>
>>
>>    - Extremely efficient streaming compression algorithm. Our streaming
>>    compression algorithm is able to compress real world time series data by
>>    over 90%. The delta of delta compression algorithm used by Beringei is also
>>    fast - we see that a single machine is able to compress more than 1.5
>>    million datapoints/second.
>>
>>
>> That "*delta of delta*" caught my attention.... This delta of delta
>> encoding is one of the Facebook Gorilla tricks that allows it to compress
>> 16 bytes into 1.37 bytes on average -- see section 4.1 that describes it --
>> http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
>>
>> This seems to be aimed at both time fields and numerical values.
>>
>> Would Lucene benefit from this?
>>
>> https://github.com/burmanm/gorilla-tsc seems to be a fresh Java
>> implementation.
>>
>> Otis
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>

Mime
View raw message