lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <>
Subject Re: Indexing file with security problem
Date Thu, 27 Jun 2013 00:57:35 GMT
Hello Lukasz,

You have many questions, but let me address just two for now.

1: I think you should consider Solr instead of plain Lucene.  There is much more infrastructure
already in place.

2: In security you have a choice of "early binding" or "late binding".  Early binding will
put information in the index ahead of time for a filter later on, as you have suggested.

You say security is "dynamic", but probably not as much as somebody might think.

You mention user Maggie as an example, having access to only certain files.  At index time,
for each file, is it possible to lookup which groups and individuals are allowed to see this
file?  You'd query the system for that document, asking for the ACL data, and store it in
the index.  Later, a search time, you query for the user's groups and build a filter.

If there is no way to lookup the ACL data at index time, then you would have to use late binding.

I know your email has other questions, but I'll leave those for later.  I think these are
the 2 most basic things to decide.


Mark Bennett / LucidWorks: Search & Big Data /
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Jun 26, 2013, at 12:43 PM, Łukasz Woźniczka <> wrote:

> Hello
> I'll try to briefly describe my problem and task.
> My name is Lukas and i am Java developer , my task is to create search
> engine for different types of file (only text file types) pdf, word, odf,
> xml but not html.
> I have got little experience with lucene about year ago i wrote simple full
> text search using lucene and hibernate search. That was simple project. But
> now i have got very difficult task with searching.
> We are using java 1.7 and glassfish 3 and i have to concentrate only server
> side approach not client ui. Ther is my three major problem :
> 1) All files is stored on webdav server, but information about file name ,
> id file typ etc are stored into database (postgresql) so when i creating
> index i need to use both information. As a result of query i need only
> return file id from database. Summary content of file is stored in server
> but information about file is stored in database so we must retrieve both.
> 2) Secondary problem it that  each file has a level of secrecy. But major
> problem is that this level is calculated dynamically. When calculating
> level of security for file we considering several properties. The static
> properties is files location, the folder in which the file is, but also
> dynamic  information  user profiles user roles and departments . So when
> user "Maggie" is logged she can search only files "test.pdf" , "test2.doc"
> etc but if user "Stev" is logged he have got different profiles such a
> Maggie so he can only search some phase in file "broken.pdf", "mybook.odt".
> test2.doc etc ..... . I think that when for example user search phase
> "lucene +solr" we search in all indexed documents and after that filtered
> result. But i think that solution is  is not very efficient. What if
> results count 100 files , so what next we filtered step by step each files
> ? But i do not see any other solution. Maybe you can help me and lucene or
> solr have got mechanism to help.
> 3) Last problem is that some files are encrypted. So that files must be
> indexed only once before encryption ! But i think that if we indexed secure
> files so we get security issue. Because all word from that file is
> tokenized.
> I have not got any idea haw to secure lucene documents and index datastore
> ? its possible ...
> Also i have got question that i need to use Solr for my serarch engine or
> using only lucene and write own search engine ? So as you can see i have
> not got problem with indexing , serching but with security files and files
> secured levels.
> Thanks for any hints and time you spend for me.
> -- 
> Regards Lukasz

View raw message