lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Austina <olivier.aust...@gmail.com>
Subject Re: Fast autocomplete for large dataset
Date Sun, 02 Aug 2015 04:41:24 GMT
Thank you Eric for your replies and the link.

Regards
Olivier


2015-08-02 3:47 GMT+02:00 Erick Erickson <erickerickson@gmail.com>:

> Here's some background:
>
> http://lucidworks.com/blog/solr-suggester/
>
> Basically, the limitation is that to build the suggester all docs in
> the index need to be read to pull out the stored field and build
> either the FST or the sidecar Lucene index, which can be a _very_
> costly operation (as in minutes/hours for a large dataset).
>
> bq: The requirement is that the autocomplete should be fast (not
> slowdown by the volume of data as dataset become bigger)
>
> Well, in some alternate universe this may be possible. But the larger
> the corpus the slower the processing will be, there's just no way
> around that. Whether it's fast enough for your application is a better
> question ;).
>
> Best,
> Erick
>
>
> On Sat, Aug 1, 2015 at 2:05 PM, Olivier Austina
> <olivier.austina@gmail.com> wrote:
> > Thank you Eric,
> >
> > I would like to implement an autocomplete for large dataset.  The
> > autocomplete should show the phrase or the question the user want as the
> > user types. The requirement is that the autocomplete should be fast (not
> > slowdown by the volume of data as dataset become bigger), and easy to
> > maintain. The autocomplete can have its own Solr server.  It is an
> > autocomplete like others but it should be only fast and easy to maintain.
> >
> > What is the limitations of suggesters mentioned in the article? Thank
> you.
> >
> > Regards
> > Olivier
> >
> >
> > 2015-08-01 19:41 GMT+02:00 Erick Erickson <erickerickson@gmail.com>:
> >
> >> Not really. There's no need to use ngrams as the article suggests if the
> >> terms component does what you need. Which is why I asked you about what
> >> autocomplete means in your context. Which you have not clarified. Have
> you
> >> even looked at terms component?  Especially the terms.prefix option?
> >>
> >> Terms component has it's limitations, but performance isn't one of them.
> >> The suggesters mentioned in the article have other limitations. It's
> really
> >> useless to discuss those limitations, though, until the problem you're
> >> trying to solve is clearly stated.
> >> On Aug 1, 2015 1:01 PM, "Olivier Austina" <olivier.austina@gmail.com>
> >> wrote:
> >>
> >> > Thank you Eric for your reply.
> >> > If I understand it seems that these approaches are using index to hold
> >> > terms. As the index grows bigger, it can be a performance issues.
> >> > Is it right? Please can you check this article
> >> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/>
to
> see
> >> > what I mean?   Thank you.
> >> >
> >> > Regards
> >> > Olivier
> >> >
> >> >
> >> > 2015-08-01 17:42 GMT+02:00 Erick Erickson <erickerickson@gmail.com>:
> >> >
> >> > > Well, defining what you mean by "autocomplete" would be a start. If
> >> it's
> >> > > just
> >> > > a user types some letters and you suggest the next N terms in the
> list,
> >> > > TermsComponent will fix you right up.
> >> > >
> >> > > If it's more complicated, the AutoSuggest functionality might help.
> >> > >
> >> > > If it's correcting spelling, there's the spellchecker.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
> >> > > <olivier.austina@gmail.com> wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I am looking for a fast and easy to maintain way to do
> autocomplete
> >> for
> >> > > > large dataset in solr. I heard about Ternary Search Tree (TST)
> >> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> >> > > > But I would like to know if there is something I missed such
as
> best
> >> > > > practice, Solr new feature. Any suggestion is welcome. Thank
you.
> >> > > >
> >> > > > Regards
> >> > > > Olivier
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message