lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Performance question
Date Tue, 06 Jan 2004 12:47:51 GMT
Scott,

Here are some figures to use for comparision.  Using the latest Lucene
release, I index about 200 similar-sized XML files at a time, on a Windows
XP machine (2Ghz).  First I create a new index, which adds the documents at
a rate of about 8 per second (I don't recall what the cpu % is during this).
Then I merge this new index with the master one (using, I think, the default
merge factor), which takes about 4.5 minutes (during which time the cpu
utilization stays near 100%).  The master index currently holds about
115,000 such documents.

HTH,

Regards,

Terry

----- Original Message -----
From: "Scott Smith" <SSmith@MainstreamData.com>
To: <lucene-user@jakarta.apache.org>
Sent: Monday, January 05, 2004 10:26 PM
Subject: Performance question


> I have an application that is reading in XML files and indexing them.
Each
> XML file is 3K-6K bytes.  This application preloads a database that I will
> add to "on the fly" later.  However, all I want it to do initially is take
> some existing files and create the initial index as quick as I can.
>
> Since I want to index "on the fly" later, I set the merge factor to 10.
I'm
> assuming that I can't create the index initially with one merge factor
> (e.g., 100) and then change the merge factor later (true?).
>
> What I see is that it takes 1-3 seconds per xml file to do the index.
This
> means I'm indexing around 150k bytes per minute.  I also notice that the
CPU
> utilization rarely exceeds 5% (looking at task manager on a Windows box).
I
> use Xerces to read in the files (SAX interface) and I don't close or
> optimize the index between stories nor do I sleep anyplace.  I've looked
at
> the page fault numbers and they aren't changing much.  I guess I would
have
> expected that I would have pretty much pegged the CPU and seen much faster
> indexing.
>
> Any ideas/suggestions?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message