lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe MA" <mrj...@comcast.net>
Subject Not-indexed, Stored Thumbnails or NoSQL?
Date Sun, 02 Dec 2018 09:20:13 GMT
Greetings,

I have an index where I import documents such as powerpoint, PDF, and so forth.  One nice
feature I added is that for each document, I store a thumbnail of the first page as an encoded
String (uuencode) using a stored,not-indexed field.  This thumbnail gets displayed when the
user finds a document.   

I am wondering if, as the size of the index grows to perhaps hundreds of thousands if not
millions of documents,  how efficient is this?  Is it a good idea?
These encoded strings could be several hundred bytes in size, and of course are completely
unique for each file indexed, and provide no 'search' value.  On the surface, it seems like
there could be a better way to do this given the size, as well as the extra retrieval time
for Lucene to pull these fields for found documents.

Since I also have a unique hash for each document in the index, it would not be too difficult
to set up a separate, independent NoSQL key/value store with the thumbnail images, such as
MongoDB or similar, and then retrieve the thumbnails from that store instead of keeping them
in the Lucene index.  Does this seem like a better approach? Or is Lucene stored field retrieval
efficient enough that there would be no benefit to doing this?  Any other ideas?

Thanks in advance,
J


  



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message