lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael D. Curtin" <>
Subject Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)
Date Wed, 02 Nov 2005 16:07:00 GMT
Richard Jones wrote:

> The data i'm dealing with is stored over a few mysql dbs on different 
> machines, horizontally partitioned so each user is assigned to a single db. 
> The queries i'm doing can be done in SQL in parallel over all machines then 
> combined, which i've tested - it's unacceptably slow :(
> There is rather a lot of data; each profile can contain in excess of 1000 
> artists, and there are almost 1million profiles created. I'm only adding the 
> top N artists for each user to a lucene index, which is fast and still yields 
> good results (even with just the top 100 artists each).
> I think this could be done reasonably well with SQL in a DBMS like Oracle RAC 
> or something like mysql cluster.. sadly the former doesnt scale financially, 
> and the latter requires everything to be held in RAM .. which is why i'm 
> using lucene.

If you're willing to continue subsetting / summarizing the data out into 
Lucene, how about subsetting it out into a dedicated MySQL instance for 
this purpose?  100 artists * 1M profiles * 2 ints * 4 bytes/int = 
roughly 1 GB of data, which would easily fit into RAM.  Queries should 
be pretty fast off of that.  Good luck!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message