nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fredrik Andersson <>
Subject Re: Nutch Suggestion? (Google like "did you mean")
Date Thu, 29 Sep 2005 07:24:25 GMT
Hi Jack!

I like these things to be driven by statistics rather than content of the
index. If you run a search engine, and want any kind of feedback, you will
at least save all queries entered. You can store these in an index or
database, and run a Levenshtein metric on the, potentially misspelled,
query. If my memory serves me right, a Lucene FuzzyQuery uses this metric,
so a good approach would be to keep a Lucene index with |query,frequency|
tuples (updated nightly, weekly, or whatever), and simply search this index
with a FuzzyQuery with some defined similarity, and pick the most frequent
query for suggestion.


On 9/29/05, Jack Tang <> wrote:
> Hi
> I am very like Google's "Did you mean" and I notice that nutch now
> does not provider this function.
> In this article, author Tim White
> implemented suggestion using n-gram to generate suggestion index. Do
> you think is it good for nutch? I mean index in nutch will be really
> huge. Or just provide some dictionaries like jazzy(LGPL) does?
> Thanks
> /Jack
> --
> Keep Discovering ... ...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message