lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norberto Meijome <free...@meijome.net>
Subject Re: tagging application, best way to architect?
Date Thu, 10 Jul 2008 04:35:42 GMT
On Thu, 10 Jul 2008 09:36:01 +0530
"Noble Paul _____________________ __________________" <noble.paul@gmail.com> wrote:

> > 2. We're assuming we'll have thousands of users with independent data; any
> > good way to partition multiple indexes with solr?   With Lucene we could
> > just save those in independent directories, and cache the index while the
> > user session is active.   I saw some configurations on tomcat that would
> > allow multiple instances, but that's probably not practical for lots of
> > concurrent users.  
> Maintaining multiple indices is not a good idea. Add an extra
> attribute 'userid' to each document and search with user id as a 'fq'.
> The caches in Solr will automatically take care of the rest.
> >

i have been pondering about something similar to this for some of the stuff i'm
working on.

Intuitively, keeping independent indices doesn't look too good. But if you
split your setup (ie, 2 different clusters if needed be), having one index for
the information that doesn't change often (email body , from, to, date,
headers? ) + message id ( or id = concat(message_id,userid) ), then you can
have a separate index for the metadata of the documents in the first index.

Everytime you have updates to the mail metadata you handle it in the
second index (not sure if this 2nd index would be the definite storage of
metadata for mails, or it's stored in your mail app and you extract and index
into SOLR afterwards). 

there is of course the new issue of scrubbing the 2nd index when emails are
removed from your system, but i don't imagine it being terribly complex.

This way, you can do away with SOLR-139 until it is stable enough + scales as
needed. or altogether , not sure how well -139 will progress.

wrt to the OPs question about 'how to partition the data' wrt thousands of
users, you should be able to use
http://wiki.apache.org/solr/DistributedSearch , or setup different clusters ,
each with distributed searchers setup , using the userid to decide on which
cluster you'll search in ( hash(userid ) would give you an even distribution
across all clusters).

Thoughts? 
B
_________________________
{Beto|Norberto|Numard} Meijome

Q. How do you make God laugh?
A. Tell him your plans.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Mime
View raw message