nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (NUTCH-1673) Title isn't reset in MoreIndexingFilter
Date Wed, 27 Nov 2013 10:17:35 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lewis John McGibbney resolved NUTCH-1673.
-----------------------------------------

    Resolution: Fixed

Committed @revision 1545982 in 2.x HEAD
Thank you [~tiennm] for patch.

> Title isn't reset in MoreIndexingFilter
> ---------------------------------------
>
>                 Key: NUTCH-1673
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1673
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 2.2.1
>            Reporter: Nguyen Manh Tien
>             Fix For: 2.3
>
>         Attachments: NUTCH-1673.patch
>
>
> In resetTitle function, title is added to doc. We need remove old title before add. Currently
it will resulted in error when indexing to solr when title field is not multivalue field.
> private NutchDocument resetTitle(NutchDocument doc, WebPage page, String url) {
> ...
>     for (int i = 0; i < patterns.length; i++) {
>       if (matcher.contains(contentDisposition.toString(), patterns[i])) {
> ...
>         doc.add("title", result.group(1));
>         break;
>       }
>     }
>     return doc;
>   }



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message