nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fredrik Andersson <fidde.anders...@gmail.com>
Subject Re: Hot Search! Re: Nutch Suggestion? (Google like "did you mean")
Date Mon, 12 Dec 2005 17:14:36 GMT
Re!

Well, you have two choices. Either you store ONE index with (what, when,
where, frequency) or THREE indices with (what, frequency), (when, frequency)
and (where, frequency). If you choose the first approach you can just parse
down the (what, when, where) strings in to one string, and by that enabling
your application to use the FuzzyQuery on just that index. I have no idea
what area your application is in, but a thing like that would be suitable
for the site mentioned previously and also a very speedy operation with just
one query.

If you'd use multiple indices you can pinpoint which field is misspelled and
take other actions on that field (i.e different searching techniques). It
should be more precise than fuzzying up just one field instead of the all
three, which is implicitly the case in the first approach.

Hope it helps,
Fredrik

On 12/12/05, Jack Tang <himars@gmail.com> wrote:
>
> Hi Fredrik
>
> Thanks for your reply:)
> It is true that you can recommed the top-n most popular queries on
> each indexed field. See the example:
> http://www.business.com/index.asp?p=true (please select the "Job" tab).
>
> However, I think betherebesquare.com is a bit different. I mean if my
> goal is to recommend <what><when><where> -- the multi-fields query.
I
> think this recommended query is quite meaningful.
>
> Is that possible in nutch? Or something I should refer to?
>
> /Jack
>
>
> On 12/12/05, Fredrik Andersson <fidde.andersson@gmail.com> wrote:
> > Hi again, Jack.
> >
> >  I don't see the problem of saving separate statistics for each field in
> > your query? In my applications, I pass the query string down to the
> > statistics index prior to QueryParser, i.e I just save "foo bar", not
> > "field1:foo field1:bar field2:foo field2:bar". If you have a similar
> thing
> > like betherebesquare.com, it shouldn't be a problem to tuck the
> different
> > fields (name, date and location) in to three statistical indices and do
> a
> > simultaneous (threaded) lookup on the three when getting a new query, to
> > make suggestions.
> >  Speaking from experience, you might want to separate the working copy
> and
> > the live copy of this statistical index, since you will want to have
> > exclusive read-access to the live index without someone writing stuff
> > (locking it) sometimes. Each low-traffic period, copy the built-up
> > statistical index, optimize() it, and replace the current live index
> with
> > the new copy.
> >
> >  Good luck,
> >  Fredrik
> >
> >
> > On 12/12/05, Jack Tang <himars@gmail.com> wrote:
> > > Hi
> > >
> > > The approach is great for one sigle query field. How about
> multi-fields?
> > > Say I want do some recommends( or show hot search) for the event
> search
> > engine
> > > http://betherebesquare.com/ .
> > >
> > > Any great thought?
> > >
> > > /Jack
> > >
> > > On 9/29/05, Fredrik Andersson <fidde.andersson@gmail.com> wrote:
> > > > Hi Jack!
> > > >
> > > >  I like these things to be driven by statistics rather than content
> of
> > the
> > > > index. If you run a search engine, and want any kind of feedback,
> you
> > will
> > > > at least save all queries entered. You can store these in an index
> or
> > > > database, and run a Levenshtein metric on the, potentially
> misspelled,
> > > > query. If my memory serves me right, a Lucene FuzzyQuery uses this
> > metric,
> > > > so a good approach would be to keep a Lucene index with
> > |query,frequency|
> > > > tuples (updated nightly, weekly, or whatever), and simply search
> this
> > index
> > > > with a FuzzyQuery with some defined similarity, and pick the most
> > frequent
> > > > query for suggestion.
> > > >
> > > >  Fredrik
> > > >
> > > > On 9/29/05, Jack Tang <himars@gmail.com > wrote:
> > > > > Hi
> > > > >
> > > > > I am very like Google's "Did you mean" and I notice that nutch now
> > > > > does not provider this function.
> > > > >
> > > > > In this article http://today.java.net/lpt/a/211 , author Tim White
> > > > > implemented suggestion using n-gram to generate suggestion index.
> Do
> > > > > you think is it good for nutch? I mean index in nutch will be
> really
> > > > > huge. Or just provide some dictionaries like jazzy(LGPL) does?
> > > > >
> > > > > Thanks
> > > > > /Jack
> > > > > --
> > > > > Keep Discovering ... ...
> > > > > http://www.jroller.com/page/jmars
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Keep Discovering ... ...
> > > http://www.jroller.com/page/jmars
> > >
> >
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message