lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: How large is your solr index?
Date Sat, 03 Jan 2015 21:51:02 GMT
Back in June on a similar thread I asked "Anybody care to forecast when
hardware will catch up with Solr and we can routinely look forward to
newbies complaining that they indexed "some" data and after only 10 minutes
they hit this weird 2G document count limit?"

Still not there.

So the race is on between when Lucene will relax the 2G limit and when
hardware gets fast enough that 2G documents can be indexed within a small
number of hours.


-- Jack Krupansky

On Sat, Jan 3, 2015 at 4:00 PM, Toke Eskildsen <te@statsbiblioteket.dk>
wrote:

> Erick Erickson [erickerickson@gmail.com] wrote:
> > Of course I wouldn't be doing the work so I really don't have much of
> > a vote, but it's not clear to me at all that enough people would actually
> > have a use-case for 2b+ docs in a single shard to make it
> > worthwhile. At that scale GC potentially becomes really unpleasant for
> > instance....
>
> Over the last years we have seen a few use cases here on the mailing list.
> I would be very surprised if the number of such cases does not keep rising.
> Currently the work for a complete overhaul does not measure up to the
> rewards, but that is slowly changing. At the very least I find it prudent
> to not limit new Lucene/Solr interfaces to ints.
>
> As for GC: Right now a lot of structures are single-array oriented (for
> example using a long-array to represent bits in a bitset), which might not
> work well with current garbage collectors. A change to higher limits also
> means re-thinking such approaches: If the garbage collectors likes objects
> below a certain size then split the arrays into that. Likewise, iterations
> over structures linear in size to the index could be threaded. These are
> issues even with the current 2b limitation.
>
> - Toke Eskildsen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message