lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Peculiar (?) Indexing Performance
Date Tue, 13 Jan 2004 21:35:43 GMT
I just aborted a re-indexing operation (because it was taking too much time - will run it overnight
instead).  But I was surprised by what I found in the index directory, which contained a total
of 1,402 index files!  It started out with 36 files with the name of "_I9a.*", followed by
groups of 72 files with names like "_17si.*" and so forth.

Is this normal?

Also, I noticed that during the indexing it would chug along, indexing at a pretty decent
rate, and then, every so often (I would estimate every several hundred added files) it would
stop for perhaps 10 - 30 seconds (occasionally longer), doing a bunch of disk activity.  Then
it would resume again - almost like it was optimizing.  (I'm doing this on a notebook, so
the disk IO is probably fairly slow.)

Is this normal?

Regards,

Terry

PS: The code I'm using to do the indexing is below:

import npg1.search.WebExecAnalyzer;
import org.apache.lucene.index.IndexWriter;
import npg1.search.WESimilarity2;
import npg1.search.WPDocument2a;

import java.io.File;
import java.util.Date;

class IndexWPFiles2a {
  public static void main(String[] args) {
 
 //args[0] = location of target directory to be indexed
 //args[1] = location of index directory (in which to create index files)
 
 System.out.println("starting"); 
 try {
  Date start = new Date();
  
  String target = "c:/master_db/master_xml";
  if(args[0] != null) {
   target = args[0];
  }  
  String index = "c:/master_db/master_index";
  if(args[1] != null) {
   index = args[1];
  }
        
  IndexWriter writer = null;
  if(args.length < 3) {
   writer = new IndexWriter(index, new WebExecAnalyzer(), true);
   writer.mergeFactor = 50;
   writer.setSimilarity(new WESimilarity2());
   indexDocs(writer, new File(target));
  } else {
   writer = new IndexWriter(index, new WebExecAnalyzer(), false);
   writer.setSimilarity(new WESimilarity2());
  }
  writer.optimize();
  writer.close();
  
  Date end = new Date();
  
  System.out.print(end.getTime() - start.getTime());
  System.out.println(" total milliseconds");
 
 } catch (Exception e) {
  System.out.println(" caught a " + e.getClass() +
    "\n with message: " + e.getMessage());
 }
  }

  public static void indexDocs(IndexWriter writer, File file)
       throws Exception {
 //System.out.println("starting indexing with internal path"); 
 if (file.isDirectory()) {
  String[] files = file.list();
  for (int i = 0; i < files.length; i++){
   //System.out.println("recursive call");
   indexDocs(writer, new File(file, files[i]));
  }
 } else {
  try {
   System.out.println("adding " + file);
   writer.addDocument(WPDocument2a.Document(file));
  } catch (Exception e) {
   System.out.println("error adding "+file+" -  Exception: "+e.getMessage());
  }
 }
  }
}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message