lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hastings <hastings.recurs...@gmail.com>
Subject Re: mixed index with commongrams
Date Thu, 03 Aug 2017 15:48:01 GMT
Thanks, thats what i kind of expected.  still debating whether the space
increase is worth it, right now Im at .7% of searches taking longer than 10
seconds, and 6% taking longer than 1, so when i see things like this in the
morning it bugs me a bit:

2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the Courts
of Equity of the United States")
2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause")
2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of
justice")

which could all be annihilated with CG's, at the expense, according to HT,
of a 40% increase in index size.



On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> bq: will that search still return results form the earlier documents
> as well as the new ones
>
> In a word, "no". By definition the analysis chain applied at index
> time puts tokens in the index and that's all you have to search
> against for the doc unless and until you re-index the document.
>
> You really have two choices here:
> 1> live with the differing results until you get done re-indexing
> 2> index to an offline collection and then use, say, collection
> aliasing to make the switch atomically.
>
> Best,
> Erick
>
> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings
> <hastings.recursive@gmail.com> wrote:
> > Hey all, I have yet to run an experiment to test this but was wondering
> if
> > anyone knows the answer ahead of time.
> > If i have an index built with documents before implementing the
> commongrams
> > filter, then enable it, and start adding documents that have the
> > filter/tokenizer applied, will searches that fit the criteria, for
> example:
> > "to be or not to be"
> > will that search still return results form the earlier documents as well
> as
> > the new ones?  The idea is that a full re-index is going to be difficult,
> > so would rather do it over time by replacing large numbers of documents
> > incrementally.  Thanks,
> > Dave
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message