lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Outar" <>
Subject RE: Indexing Growth
Date Wed, 02 Apr 2003 14:26:13 GMT
Just about everything calls getValue:

 public synchronized String getValue(String key, File file)
    throws ParseException,  IOException {

        Document doc = getDocument(file);
        return doc.get(key.toLowerCase());

which calls get document:

 private synchronized Document getDocument(File file) throws
    IOException {

        Term t       = new Term(PATH,
        TermQuery tQ = new TermQuery(t);

        Hits hits    =;

        if (hits.length() == 1) {
            return hits.doc(0);
        //this should never happen, cannot have a URL that returns 2 hits
        //that would mean the same file has been indexed twice
        else {
            return null;



-----Original Message-----
From: Michael Barry []
Sent: Wednesday, April 02, 2003 9:20 AM
To: Lucene Users List
Subject: Re: Indexing Growth

Sounds like you either have an indexer that's run amok (maybe
a background process that's continually re-indexing your sandbox -
or expanding outside your sandbox) or your Query code is doing more
than querying. It's not behaviour I've seen. Without a snippet of
Query code, it's going to be hard to help.

Rob Outar wrote:

>Hi all,
>	This is too odd and I do not even know where to start.  We built a Windows
>Explorer type tool that indexes all files in a "sabdboxed" file system.
>Each Lucene document contains stuff like path, parent directory, last
>modified date, file_lock etc..  When we display the files in a given
>directory through the tool we query the index about 5 times for each file
>the repository, this is done so we can display all attributes in the index
>about that file.  So for example if there are 5 files in the directory,
>file has 6 attributes that means about 30 term queries are executed.  The
>initial index when build it about 10.4megs, after accessing about 3 or 4
>directories the index size increased to over 100megs, and we did not add
>anything!!  All we are doing is querying!!  Yesterday after querying became
>ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
>(granted we tested the tool all morning).  But I have no idea why the index
>is growing like this.  ANY help would be greatly appreciated.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message