lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.J. Larrea" <>
Subject Re: Searching multiple indices (solr newbie)
Date Tue, 09 Jan 2007 21:18:16 GMT
+2 cents:

At 2:43 PM +0530 1/9/07, Mekin Maheshwari wrote:
>In general I felt that smaller indexes with different requirements
>might be more flexible than 1 large index (Would a  3G index
>considered large ?). eg. backing up the index, deploying a fresh
>index, etc. But Solr does address most of these.

3Gb indexes are not at all unreasonable -- I have a Lucene-based (soon-to-be SOLR-based) app
which uses 5 indexes, the biggest of which is 3.8Gb.  The combined index is 6.7Gb.

>The assumption could be baseless now & I should probably consider
>having 1 index for all categories.

An important thing to note is that Lucene does not store information in a grid as do RDBMSs,
it only stores the fields which are explicitly defined for each Document. So if some class
of Documents has a set of class-specific fields, there is no storage penalty for the non-class
Documents which don't have them.  And Lucene's querying mechanism is very efficient at dealing
with sparse values in the index so the query-time penalty is slight.

As Hoss pointed out, SOLR's wildcard-field specification makes it very simple take advantage
of Lucene's sparse storage: SOLR will tell Lucene to index and/or store any field matching
one of the wildcard patterns, and the Request Handlers will allow * as a field name which
returns all stored fields in the resulting documents.

So while there may still be some issues needing to be worked out with a single index in your
specific case, it is probably much simpler than integrating hits from multiple indexes.

- J.J.

View raw message