nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Tang <him...@gmail.com>
Subject Re: Hot Search! Re: Nutch Suggestion? (Google like "did you mean")
Date Mon, 12 Dec 2005 16:49:26 GMT
Hi Fredrik

Thanks for your reply:)
It is true that you can recommed the top-n most popular queries on
each indexed field. See the example:
http://www.business.com/index.asp?p=true (please select the "Job" tab).

However, I think betherebesquare.com is a bit different. I mean if my
goal is to recommend <what><when><where> -- the multi-fields query. I
think this recommended query is quite meaningful.

Is that possible in nutch? Or something I should refer to?

/Jack


On 12/12/05, Fredrik Andersson <fidde.andersson@gmail.com> wrote:
> Hi again, Jack.
>
>  I don't see the problem of saving separate statistics for each field in
> your query? In my applications, I pass the query string down to the
> statistics index prior to QueryParser, i.e I just save "foo bar", not
> "field1:foo field1:bar field2:foo field2:bar". If you have a similar thing
> like betherebesquare.com, it shouldn't be a problem to tuck the different
> fields (name, date and location) in to three statistical indices and do a
> simultaneous (threaded) lookup on the three when getting a new query, to
> make suggestions.
>  Speaking from experience, you might want to separate the working copy and
> the live copy of this statistical index, since you will want to have
> exclusive read-access to the live index without someone writing stuff
> (locking it) sometimes. Each low-traffic period, copy the built-up
> statistical index, optimize() it, and replace the current live index with
> the new copy.
>
>  Good luck,
>  Fredrik
>
>
> On 12/12/05, Jack Tang <himars@gmail.com> wrote:
> > Hi
> >
> > The approach is great for one sigle query field. How about multi-fields?
> > Say I want do some recommends( or show hot search) for the event search
> engine
> > http://betherebesquare.com/ .
> >
> > Any great thought?
> >
> > /Jack
> >
> > On 9/29/05, Fredrik Andersson <fidde.andersson@gmail.com> wrote:
> > > Hi Jack!
> > >
> > >  I like these things to be driven by statistics rather than content of
> the
> > > index. If you run a search engine, and want any kind of feedback, you
> will
> > > at least save all queries entered. You can store these in an index or
> > > database, and run a Levenshtein metric on the, potentially misspelled,
> > > query. If my memory serves me right, a Lucene FuzzyQuery uses this
> metric,
> > > so a good approach would be to keep a Lucene index with
> |query,frequency|
> > > tuples (updated nightly, weekly, or whatever), and simply search this
> index
> > > with a FuzzyQuery with some defined similarity, and pick the most
> frequent
> > > query for suggestion.
> > >
> > >  Fredrik
> > >
> > > On 9/29/05, Jack Tang <himars@gmail.com > wrote:
> > > > Hi
> > > >
> > > > I am very like Google's "Did you mean" and I notice that nutch now
> > > > does not provider this function.
> > > >
> > > > In this article http://today.java.net/lpt/a/211 , author Tim White
> > > > implemented suggestion using n-gram to generate suggestion index. Do
> > > > you think is it good for nutch? I mean index in nutch will be really
> > > > huge. Or just provide some dictionaries like jazzy(LGPL) does?
> > > >
> > > > Thanks
> > > > /Jack
> > > > --
> > > > Keep Discovering ... ...
> > > > http://www.jroller.com/page/jmars
> > > >
> > >
> > >
> >
> >
> > --
> > Keep Discovering ... ...
> > http://www.jroller.com/page/jmars
> >
>
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Mime
View raw message