lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject lucene as a graph store
Date Tue, 15 Jan 2008 12:17:49 GMT
Hello;

I like to use lucene as a graph store. The graph representation is a list of
edges. Consider the code below:

        final int commitCount = 16 * 1024;
        final int numObj =  1024 * 1024;

        Analyzer analyzer = new KeywordAnalyzer();
        FSDirectory directory = FSDirectory.getDirectory("c:\\LuceneAdd");
        IndexWriter writer = new IndexWriter(directory, analyzer, true);

        Document doc;
        long start = System.currentTimeMillis();

        Random r = new Random(System.currentTimeMillis());

        for(int i=0; i<numObj; i++) {
            doc = new Document();
            doc.add(new Field("srcKey", NumberTools.longToString(i),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("dstKey",
NumberTools.longToString(r.nextInt(numObj)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("linkKey",
NumberTools.longToString(r.nextInt(16)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("linkValue", NumberTools.longToString(
r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));

           writer.addDocument(doc);

            if(i%commitCount==0) {
                 long now = System.currentTimeMillis();
                 System.out.println(i + ":" + (now-start));
                 start = now;
            }

        }

        writer.optimize();
        writer.close();
        directory.close();


Basically I am adding a large number of documents from srcKey = i to dstKey
= random and two other string fields - linkKey and linkValue.

Compared to a normal database store, or an oodbms such as perst or db4o -
lucene takes longer to index.
However, it is much faster in searching, finding, retrieving records.

I can make 16384 random lookups over 1Million entries in 0.8 seconds. This
is excellent time. (I have been benchmarking for a long time)

Typically, when number of objects in BTree based structure in an oodbms for
example increase, the search and add times also increase.

Will lucene have the same problem and how can I overcome it if it does.
Looking at the above code - does anyone has any recomendations
to improve index performance. (also what can I do to improve search
performance)

While searching with an indexsearcher - does lucene do any caching? usually
MRU caches are used to accomplish this.

Any ideas,help,recomendations greatly appreciated.

Best,
-C.B.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message