lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glen Newton" <glen.new...@gmail.com>
Subject Re: Does Lucene Supports Billions of data
Date Wed, 30 Apr 2008 14:37:07 GMT
I understand. But it depends on implementation: if there are things in
Lucene that are O(n^2) or worse, then Moore's Law will not help with
large numbers. But if they are mostly O(n) or O(nlogn) on the large
numbers, then we can wait for bigger, faster, more cores to allow us
to use Lucene for billions of documents. You can go out now and buy a
64 dual core Sun SPARC box: which would likely scale better than any
network solution.

But of course as you point out, if we are going commodity hardware,
the Google distributed back-ends solution is the way to go......

-glen

2008/4/30 John Wang <john.wang@gmail.com>:
> I am not sure how well lucene would perform with > 2 Billion docs in a
>  single index anyway.
>  I have posted a while ago about considering different ways of building
>  distributed search. A master-slave hierarchical model has been the norm, I
>  was hoping to see more of a system built on top of a Hadoop like
>  infrastructure where it is seamless to scale. Ning at IBM has written some
>  cool stuff into HBase for building index shards from an HBase table.
>
>  -John
>
>
>
>  On Wed, Apr 30, 2008 at 9:46 PM, Glen Newton <glen.newton@gmail.com> wrote:
>
>  > I have created Indexes with 1.5 billion documents.
>  >
>  > It was experimental: I took an index with 25 million documents, and
>  > merged it with itself many times. While not definitive as there were
>  > only 25m unique documents that were duplicated, it did prove that
>  > Lucene should be able to handle this number of (unique) documents.
>  >
>  > That said, Lucene needs to support >2B, so docids (and all associated
>  > internals) need to become 'long' fairly soon....
>  >
>  > -Glen
>  >
>  > 2008/4/30 John Wang <john.wang@gmail.com>:
>  > > lucene docids are represented in a java int, so max signed int would be
>  > the
>  > >  limit, a little over 2 billion.
>  > >
>  > >  -John
>  > >
>  > >
>  > >
>  > >  On Wed, Apr 30, 2008 at 11:54 AM, Sebastin <sebasmtech@gmail.com>
>  > wrote:
>  > >
>  > >  >
>  > >  > Hi All,
>  > >  > Does Lucene supports Billions of data in a single index store of size
>  > 14
>  > >  > GB
>  > >  > for every search.I have 3 Index Store of size 14 GB per index i need
>  > to
>  > >  > search these index store and retreive the result.it throws out of
>  > memory
>  > >  > problem while searching this index stores.
>  > >  > --
>  > >  > View this message in context:
>  > >  >
>  > http://www.nabble.com/Does-Lucene-Supports-Billions-of-data-tp16974808p16974808.html
>  > >  > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>  > >  >
>  > >  >
>  > >  > ---------------------------------------------------------------------
>  > >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > >  > For additional commands, e-mail: java-user-help@lucene.apache.org
>  > >  >
>  > >  >
>  > >
>  >
>  >
>  >
>  > --
>  >
>  > -
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > For additional commands, e-mail: java-user-help@lucene.apache.org
>  >
>  >
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message