lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Per user data store
Date Tue, 05 Aug 2008 13:40:03 GMT
I'd start out with one index, if for no other reason
than keeping track of one index for each user would
be a royal pain in the neck. You haven't told us
how many users or documents you expect,
so that's just a guess. There's one answer perhaps
if you wind up with a 10M index, another if it's 10T.....

Filtering on the username is a fine idea, although
I'd also start by just ANDing in the username to
the query to start. Then measure your resonse
time. Note that the first time you open a reader, the
response will be slow so measure queries 2-n

I don't know the guts of Lucene, but my indexes do NOT
grow linearly with the data. After a very few docs, adding,
say, 1M of data does not cause the data to grow by 1M (or
even close to that) for fields that are NOT stored. I've
learned to just trust that the very bright people who work
on Lucene have "done the right thing" <G>...


On Tue, Aug 5, 2008 at 8:36 AM, Ganesh - yahoo <>wrote:

> Hello all,
> Documents coressponding to multiple users are to be indexed. Each user is
> going to search only his documents. Only Administrator could search all
> users data.
> Is it good to have one database for each User or to have only one database
> for all Users? Which will be better?
> My opinion is to have one database for all users and to have field
> 'Username'. Using this field data will get filtered out and the search
> results will be served to the User. In this approach, whether Username
> should be part of boolean query or TermFilter will be the better approach?
> One more technical question: Username field will have repeated entry of the
> user names. Whether the space for this field will be consumped for every
> document / record or the data will be tokenzied and a pointer to the
> document will be stored.
> Regards
> Ganesh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message