lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Watson <>
Subject Memory Usage
Date Thu, 03 Jul 2008 21:25:51 GMT
Hello All,

I have something that's not exactly causing me a major problem, but I  
would appreciate help in understanding the behaviour here:

I have an internet message board, and I soon hope to revamp the code  
to be using Lucene for searching the threads and posts, as it's far  
better than the database's fulltext capability. However, one of the  
sort of things I want to be able to do is for a user to be able to  
request a list of posts, written by user x, ordered by the newest  
first (and it's this sorting of the items by date that is the issue  

To do this, I have a timestamp in the index, along with each post,  
user etc.

I find that if I use the Java SimpleDateFormat class to encode the  
timestamp like this: yyMMdd (let's not worry about the year 2100  
problem for now!), then I can measure the index cache (which is fully  
loaded, since I need to sort the results) as taking somewhere in the  
region of 30M of memory.

Now, I noticed that obviously if I index like the above, I won't get  
the correct sort order for several posts having been posted on the  
same day, so I changed it to index yyMMddHHmmss to index down to the  
second, rather than just the day. I didn't pay much attention to  
memory usage until I started getting out of heap space errors... When  
I looked into the usage I found:

(there are around 6,000,000 posts on the message board database)

Date encoded as yyMMdd: appears to be using around 30M
Date encoded as yyMMddHHmmss:  appears to be using more than 400M!

I guess I would have understood if I was seeing the usage double for  
sure, or even a little more; no idea how you guys encode the indexes,  
if at all, but it's gone up over tenfold, which I can't explain.

For now, I have just moved it back to do it on a per day basis, as  
it's not a huge deal, but can anyone help with this? Is there  
something I might be doing wrong? That's all I changed between the two  
runs, and it certainly seems to be repeatable. I tried upgrading from  
the previous version of Lucene to the latest one, but no difference.

Many thanks,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message