lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Tamper resistant index
Date Tue, 10 Jan 2012 00:07:53 GMT
On 09/01/2012 16:27, Mike C wrote:
> Hi,
> I'm investigating storing syslog data using Lucene (via Solr or
> Elasticsearch, undecided at present). The syslogs belong to systems
> under the scope of the PCI DSS (Data Security Standard), and one of
> the requirements is to ensure logs aren't tampered with. I'm looking
> for advice on how to accomplish this.
> Looking through the Lucene documentation, I believe there doesn't
> exist any built in functionality to secure index data through digital
> signatures or HMACs. Is this the case, or have I overlooked something?
> I see there is a lucenetransform project
> ( that offers encryption,
> but not digital signatures. I'm not concerned about hiding the
> contents of the data, just need to ensure it hasn't been tampered
> with. At present I use Splunk, which signs and verifies blocks of
> indexed data. Unfortunately its pricing model doesn't scale well,
> hence looking for a lucene-based solution.
> I suppose I could add a digital signature programmatically to each
> lucene Document/Syslog, though it seems like a lot of overhead.
> Lucenetransforms approach does seem to suggest that I could provide a
> digital signature version of Directory (and IndexInput/IndexOutput),
> however before I go down that rabbit hole, decided to check in here.
> Any advice or suggestions appreciated.

This is an interesting and important problem.

I assume that the signature(s) should be created as a part of the 
regular indexing process, and in a sense they would also depend on and 
provide a way to verify the authenticity of the application that created 
the index (because the application has to know how to create valid 
signatures). You would obviously need a counterpart application that can 
verify such signatures.

Per-document sigs do add some overhead, but if you can keep them small 
(128 bits?) then you can still use stored fields (or DocValues in trunk, 
which offer a more efficient, compact representation). Still, if you 
need non-repudiation for certain sequences of events then you need to 
sign such sequences too - in Lucene terms this would be probably 
segments or Directory files.

So the "transformation" approach can work well for creating global (per 
segment and per file) signatures - instead of encrypting you would pass 
all data that is written to Directory through a HMAC algo, which on 
stream close would simply write a signature to a separate file in 
Directory - this can be easily implemented as a Directory wrapper. The 
only complication here is that you would have to handle changes related 
to segment merges yourself, i.e. you would have to do something with sig 
files that correspond to obsolete segments (discard?).

In Lucene trunk you can use the Codec API to essentially do the same as 
explained above, only this time you can interpret the data more easily, 
e.g. if some aspects of data (postings, payloads, term dictionary) are 
not so important for the signature as e.g. stored fields are, then you 
can skip them - and finally when a batch of documents (that corresponds 
to a Lucene segment) is finished you would write the signatures to 
additional files - only this time the sig files would be known as 
belonging to that segment, so you would get some help from Lucene during 
segment merging and you could handle merging of data (create additional 
sigs for every merge? or recompute sig for the new segment?), and old 
sigs would be deleted whenever old segments are deleted due to merging.

I'd give it a shot with Directory-based approach first, because it's 
easy to implement, and then see if it's good enough.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message