nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fredrik Andersson <fidde.anders...@gmail.com>
Subject Re: Hot Search! Re: Nutch Suggestion? (Google like "did you mean")
Date Mon, 12 Dec 2005 10:01:42 GMT
Hi again, Jack.

I don't see the problem of saving separate statistics for each field in your
query? In my applications, I pass the query string down to the statistics
index prior to QueryParser, i.e I just save "foo bar", not "field1:foo
field1:bar field2:foo field2:bar". If you have a similar thing like
betherebesquare.com, it shouldn't be a problem to tuck the different fields
(name, date and location) in to three statistical indices and do a
simultaneous (threaded) lookup on the three when getting a new query, to
make suggestions.
Speaking from experience, you might want to separate the working copy and
the live copy of this statistical index, since you will want to have
exclusive read-access to the live index without someone writing stuff
(locking it) sometimes. Each low-traffic period, copy the built-up
statistical index, optimize() it, and replace the current live index with
the new copy.

Good luck,
Fredrik

On 12/12/05, Jack Tang <himars@gmail.com> wrote:
>
> Hi
>
> The approach is great for one sigle query field. How about multi-fields?
> Say I want do some recommends( or show hot search) for the event search
> engine
> http://betherebesquare.com/.
>
> Any great thought?
>
> /Jack
>
> On 9/29/05, Fredrik Andersson <fidde.andersson@gmail.com> wrote:
> > Hi Jack!
> >
> >  I like these things to be driven by statistics rather than content of
> the
> > index. If you run a search engine, and want any kind of feedback, you
> will
> > at least save all queries entered. You can store these in an index or
> > database, and run a Levenshtein metric on the, potentially misspelled,
> > query. If my memory serves me right, a Lucene FuzzyQuery uses this
> metric,
> > so a good approach would be to keep a Lucene index with
> |query,frequency|
> > tuples (updated nightly, weekly, or whatever), and simply search this
> index
> > with a FuzzyQuery with some defined similarity, and pick the most
> frequent
> > query for suggestion.
> >
> >  Fredrik
> >
> > On 9/29/05, Jack Tang <himars@gmail.com> wrote:
> > > Hi
> > >
> > > I am very like Google's "Did you mean" and I notice that nutch now
> > > does not provider this function.
> > >
> > > In this article http://today.java.net/lpt/a/211 , author Tim White
> > > implemented suggestion using n-gram to generate suggestion index. Do
> > > you think is it good for nutch? I mean index in nutch will be really
> > > huge. Or just provide some dictionaries like jazzy(LGPL) does?
> > >
> > > Thanks
> > > /Jack
> > > --
> > > Keep Discovering ... ...
> > > http://www.jroller.com/page/jmars
> > >
> >
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message