lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zareen syed" <zareens...@gmail.com>
Subject Indexing large corpus of wikipedia
Date Wed, 30 May 2007 18:08:52 GMT
Hi Everybody,

I would appreciate if lucene users could give me some advice about the
project I am doing.
I want to use lucene to index wikipedia and do searches on it as a part of
my project.
- Firstly, I am concerned about the size of wikipedia. Is it feasible to do
it on a single machine
  with 1 GB of Physical Memory and 1.3GHz processor. Can lucene handle it
efficiently.
- Secondly, I wanted to know that when doing search does lucene load the
whole index in memory
   or if the index size is larger than memory then would the search be
significantly slowed down.
- Thirdly, I want to pass documents as queries to wikipedia corpus, can
lucene handle such large
   queries equivanlent to documents sizes, if not what modifications do I
have to make.
- Fourthly, would it be efficient to keep subsets of index on different
machines and then distribute the query to all of them.

Any suggestions on how to improve the performance with lucene considering
the problem I mentioned above.

Thank you all,
Zareen Saba Syed

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message