lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: lucene as a graph store
Date Tue, 15 Jan 2008 13:50:07 GMT
Usually for implementing things like page rank, or doing centrality metric
calculations or maybe dijkstras shortest term, this kind of (list of edges)
graph is not best at performance.
I like to use lucene for simple operations like neighboors of this node, or
2 degree neighboors of this node.

is updating an index costly operation in lucene? I dont think there will too
much updates, but rather deletes.

when I delete documents from an index what things should I be careful about.

Best.
-C.B.



On Jan 15, 2008 3:31 PM, Grant Ingersoll <gsingers@apache.org> wrote:

> I guess the question comes down to what kind of things are you going
> to do w/ this graph?  How often are you updating links, etc?  I can't
> say Lucene was designed for this kind of thing, but I am constantly
> amazed at what people use Lucene for, so I won't say it can't be
> done.  I don't know how efficient it would be for doing things like
> PageRank or other graph algorithms, but I would be interesting in
> hearing more about what you have in mind.
>
> Lucene doesn't do much caching at the document level, but that is
> fairly easy to implement, but it does bring some terms, etc. into
> memory, and you may have a look at the FieldCache.
>
> -Grant
>
> On Jan 15, 2008, at 7:17 AM, Cam Bazz wrote:
>
> > Hello;
> >
> > I like to use lucene as a graph store. The graph representation is a
> > list of
> > edges. Consider the code below:
> >
> >        final int commitCount = 16 * 1024;
> >        final int numObj =  1024 * 1024;
> >
> >        Analyzer analyzer = new KeywordAnalyzer();
> >        FSDirectory directory = FSDirectory.getDirectory("c:\
> > \LuceneAdd");
> >        IndexWriter writer = new IndexWriter(directory, analyzer,
> > true);
> >
> >        Document doc;
> >        long start = System.currentTimeMillis();
> >
> >        Random r = new Random(System.currentTimeMillis());
> >
> >        for(int i=0; i<numObj; i++) {
> >            doc = new Document();
> >            doc.add(new Field("srcKey", NumberTools.longToString(i),
> > Field.Store.YES, Field.Index.UN_TOKENIZED));
> >            doc.add(new Field("dstKey",
> > NumberTools.longToString(r.nextInt(numObj)),
> > Field.Store.YES, Field.Index.UN_TOKENIZED));
> >            doc.add(new Field("linkKey",
> > NumberTools.longToString(r.nextInt(16)),
> > Field.Store.YES, Field.Index.UN_TOKENIZED));
> >            doc.add(new Field("linkValue", NumberTools.longToString(
> > r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));
> >
> >           writer.addDocument(doc);
> >
> >            if(i%commitCount==0) {
> >                 long now = System.currentTimeMillis();
> >                 System.out.println(i + ":" + (now-start));
> >                 start = now;
> >            }
> >
> >        }
> >
> >        writer.optimize();
> >        writer.close();
> >        directory.close();
> >
> >
> > Basically I am adding a large number of documents from srcKey = i to
> > dstKey
> > = random and two other string fields - linkKey and linkValue.
> >
> > Compared to a normal database store, or an oodbms such as perst or
> > db4o -
> > lucene takes longer to index.
> > However, it is much faster in searching, finding, retrieving records.
> >
> > I can make 16384 random lookups over 1Million entries in 0.8
> > seconds. This
> > is excellent time. (I have been benchmarking for a long time)
> >
> > Typically, when number of objects in BTree based structure in an
> > oodbms for
> > example increase, the search and add times also increase.
> >
> > Will lucene have the same problem and how can I overcome it if it
> > does.
> > Looking at the above code - does anyone has any recomendations
> > to improve index performance. (also what can I do to improve search
> > performance)
> >
> > While searching with an indexsearcher - does lucene do any caching?
> > usually
> > MRU caches are used to accomplish this.
> >
> > Any ideas,help,recomendations greatly appreciated.
> >
> > Best,
> > -C.B.
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
> http://www.lucenebootcamp.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message