lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <cristian.lorenze...@gmail.com>
Subject Re: docid is just a signed int32
Date Sun, 21 Aug 2016 17:31:36 GMT
i m overviewing TopDocs.merge.

What is the difference to use multiple SearchIndexer and then to use
TopDocs or to use MultiReader?

2016-08-21 2:28 GMT+02:00 Cristian Lorenzetto <cristian.lorenzetto@gmail.com
>:

> For my opinion this study dont tell any thing more than before. Obviously
> if you try to retrieve all data store in a single query the performance
> will be not good. Lucene is fantastic But no magic. The physic laws
> continue to work also with lucene. The query is designed for retrieving a
> small part of a big store, not All The store. In addition i think The time
> would be worst also if you dont sort documents. Using a sorted linked list
> persisted i dont see relevant delays . Syncerely i dont understand also gc
> memory limit with lucene algorithm. The size of memory used is not
> proporzional to the datastore size, else lucene will be not scalable. The
> problem to analize for me is another : considering The trend of big data to
> encrease in The last years , considering The classical max size of a
> database among those we know, considering The possibility or not to scale
> up sharding in lucene in arrays defined dinamically or not , we can
> evaluate if this refactoring has sense or not.
>
> Inviato da iPad
>
> > Il giorno 19 ago 2016, alle ore 05:50, Erick Erickson <
> erickerickson@gmail.com> ha scritto:
> >
> > OK, I'm a little out of my league here, but I'll plow on anyway....
> >
> > bq: There are use cases out there where >2^31 does make sense in a
> single index
> >
> > Ok, let's put some definition to this and define the use-case
> > specifically rather than
> > be vague. I've just run an experiment for instance where I had 200M
> > docs in a single
> > shard (very small docs) and tried to sort by a date on all of them.
> > Performance on the order of
> > 5 seconds. 3B is what, 75 seconds? Does the use-case involve sorting?
> > Faceting? If
> > so the performance will probably be poor.
> >
> > This would be huge surgery I believe, and there hasn't been a
> > compelling use-case
> > in the search world for it. Unless and until that case is made I
> > suspect this idea will
> > meet with a lot of resistance.
> >
> > That said, I do understand that this is somewhat akin to "Nobody will
> > ever need more
> > than 64K of ram", meaning that some limits are assumed and eventually
> become
> > outmoded. But given Java's issues with memory and GC I suspect that
> > it'll be really
> > hard to justify the work this would take.
> >
> > FWIW,
> > Erick
> >
> >
> >> On Thu, Aug 18, 2016 at 6:31 PM, Trejkaz <trejkaz@trypticon.org> wrote:
> >>> On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand <jpountz@gmail.com>
> wrote:
> >>> No, IndexWriter enforces that the number of documents cannot go over
> >>> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
> >>> BaseCompositeReader computes the number of documents in a long
> variable and
> >>> ensures it is less than 2^31, so you cannot have indexes that contain
> more
> >>> than 2^31 documents.
> >>>
> >>> Larger collections should be written to multiple shards and use
> >>> TopDocs.merge to merge results.
> >>
> >> But hang on:
> >> * TopDocs#merge still returns a TopDocs.
> >> * TopDocs still uses an array of ScoreDoc.
> >> * ScoreDoc still uses an int doc ID.
> >>
> >> Looks like you're still screwed.
> >>
> >> I wish IndexReader would use long IDs too, because one IndexReader can
> >> be across multiple shards too - it doesn't make much sense to me that
> >> this is restricted, although "it's hard to fix in a
> >> backwards-compatible way" is certainly a good reason. :D
> >>
> >> TX
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message