nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (NUTCH-72) Query basic filter with correction feature
Date Fri, 01 Apr 2011 14:37:06 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Markus Jelsma closed NUTCH-72.
------------------------------

    Resolution: Won't Fix

> Query basic filter with correction feature
> ------------------------------------------
>
>                 Key: NUTCH-72
>                 URL: https://issues.apache.org/jira/browse/NUTCH-72
>             Project: Nutch
>          Issue Type: New Feature
>          Components: searcher
>         Environment: lucene
>            Reporter: Christophe Noel
>         Attachments: querycorrectionplugin.zip
>
>
> This plugin improves query-basic plugin with a correction feature.
> Lucene includes FuzzyQuery feature which consists of searching not only for matching
terms, but searching for very similar terms too.
> This plugin should be used instead of query-basic, for people looking for an easy solution
about users query requests correction.
> Correction Query Plugin can be used as follows :
> Solution 1 :  If you want to search for very similar terms, add autocorrectionmod as
the first term of the query (example : 'nutch engine' -> 'autocorrectionmod nutch engine')
> Solution 2 : Create a new search.jsp page which include a "correction" checkbox management
(<input type="checkbox" name="autocorrection" value="true"> may automatically add 'autocorrectionmod'
as the first term of the query) 
> QueryFuzzy knows a big problem : it is very slow for large index !
> So Correction Query Plugin works as follows :
> - it is not useful for big indexes
> - it only works for 5 characters and more words
> - it only look for words matching with the 2 first characters (to improve performance
this should be set to 3/4)
> - it only works for 65 % matching suffixes (algorithm is levenstein)
> PLease give your opinion about it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message