lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darx Oman <>
Subject Indexing Best Practice
Date Mon, 11 Apr 2011 06:20:13 GMT
Hi guys

I'm wondering how to best configure solr to fulfills my requirements.

I'm indexing data from 2 data sources:
1- Database
2- PDF files (password encrypted)

Every file has related information stored in the database.  Both the file
content and the related database fields must be indexed as one document in
solr.  Among the DB data is *per-user* permissions for every document.

The file contents nearly never change, on the other hand, the DB data and
especially the permissions change very frequently which require me to
re-index everything for every modified document.

My problem is in process of decrypting the PDF files before re-indexing them
which takes too much time for a large number of documents, it could span to
days in full re-indexing.

What I'm trying to accomplish is eliminating the need to re-index the PDF
content if not changed even if the DB data changed.  I know this is not
possible in solr, because solr doesn't update documents.

So how to best accomplish this:

Can I use 2 indexes one for PDF contents and the other for DB data and have
a common id field for both as a link between them, *and results are treated
as one Document*?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message