lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: lucene as a graph store
Date Tue, 15 Jan 2008 16:59:42 GMT
Re indexing performance, you are not making use of various IndexWriter parameters.  My suggestion:
wait another week, Lucene 2.3 will be out then.  Check IndexWriter javadocs for various knobs
for improving indexing performance.  Actually, check the Wiki, there is a page about just
that there.

Yes, the larger the index, the slower the search, of course, and your tests can show you when
the speed becomes unacceptable for your type of index, hardware, search rate, etc.  What you
do then is up to you.  You can spread the search load over multiple boxes, or you can split
your index and do multiple searches in parallel over smaller indices sitting on multiple boxes.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Cam Bazz <cambazz@gmail.com>
To: java-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 7:17:49 AM
Subject: lucene as a graph store

Hello;

I like to use lucene as a graph store. The graph representation is a
 list of
edges. Consider the code below:

        final int commitCount = 16 * 1024;
        final int numObj =  1024 * 1024;

        Analyzer analyzer = new KeywordAnalyzer();
        FSDirectory directory =
 FSDirectory.getDirectory("c:\\LuceneAdd");
        IndexWriter writer = new IndexWriter(directory, analyzer,
 true);

        Document doc;
        long start = System.currentTimeMillis();

        Random r = new Random(System.currentTimeMillis());

        for(int i=0; i<numObj; i++) {
            doc = new Document();
            doc.add(new Field("srcKey", NumberTools.longToString(i),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("dstKey",
NumberTools.longToString(r.nextInt(numObj)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("linkKey",
NumberTools.longToString(r.nextInt(16)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("linkValue", NumberTools.longToString(
r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));

           writer.addDocument(doc);

            if(i%commitCount==0) {
                 long now = System.currentTimeMillis();
                 System.out.println(i + ":" + (now-start));
                 start = now;
            }

        }

        writer.optimize();
        writer.close();
        directory.close();


Basically I am adding a large number of documents from srcKey = i to
 dstKey
= random and two other string fields - linkKey and linkValue.

Compared to a normal database store, or an oodbms such as perst or db4o
 -
lucene takes longer to index.
However, it is much faster in searching, finding, retrieving records.

I can make 16384 random lookups over 1Million entries in 0.8 seconds.
 This
is excellent time. (I have been benchmarking for a long time)

Typically, when number of objects in BTree based structure in an oodbms
 for
example increase, the search and add times also increase.

Will lucene have the same problem and how can I overcome it if it does.
Looking at the above code - does anyone has any recomendations
to improve index performance. (also what can I do to improve search
performance)

While searching with an indexsearcher - does lucene do any caching?
 usually
MRU caches are used to accomplish this.

Any ideas,help,recomendations greatly appreciated.

Best,
-C.B.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message