[ https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1104: ---------------------------------------- Fix Version/s: (was: 2.1) 2.2 > Port issues from trunk NutchGora branch > --------------------------------------- > > Key: NUTCH-1104 > URL: https://issues.apache.org/jira/browse/NUTCH-1104 > Project: Nutch > Issue Type: Task > Affects Versions: nutchgora > Reporter: Markus Jelsma > Fix For: 2.2 > > > Umbrella issue for tracking issues that should be ported from 1.x trunk to the NutchGora branch. Please mark ported issues by modifying this description. > NOT YET PORTED: > * NUTCH-809 Parse-metatags plugin > * NUTCH-987 Support HTTP auth for Solr communication > * NUTCH-1028 Log parser keys > * NUTCH-1036 Solr jobs should increment counters in Reporter > * NUTCH-1057 Make fetcher thread time out configurable > * NUTCH-1067 Configure minimum throughput for fetcher > * NUTCH-1101 Options to purge db_gone records in updatedb > * NUTCH-1102 Fetcher, rely on fetcher.parse directive only > * NUTCH-1105 MaxContentLength option for index-basic > * NUTCH-940 Statis field plugin > * NUTCH-1094 create comprehensive documentation for Nutch 2.0 trunk > * NUTCH-1207 ParserChecker to output signature > * NUTCH-1090 InvertLinks should inform when ignoring internal links > * NUTCH-1174 Outlinks are not properly normalized > * NUTCH-1203 ParseSegment to show number of milliseconds per parse > * NUTCH-1173 DomainStats doesn't count db_not_modified > * NUTCH-1155 Host/domain limit in generator is generate.max.count+1 > * NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex > * NUTCH-1142 Normalization and filtering in WebGraph > * NUTCH-1153 LinkRank not to log all keys and not to write Hadoop _SUCCESS file > * NUTCH-1195 Add Solr 4x (trunk) example schema > * NUTCH-1141 Configurable Fetcher queue depth > * NUTCH-1214 DomainStats tool should be named for what it's doing > * NUTCH-1213 Pass additional SolrParams when indexing to Solr > * NUTCH-1211 URLFilterChecker command line help doesn't inform user of STDIN requirements > * NUTCH-1231 Upgrade to Tika 1.0 > * NUTCH-1230 MimeType API deprecated and breaks with Tika 1.0 > * NUTCH-1235 Upgrade to new Hadoop 0.20.205.0 > * NUTCH-1184 Fetcher to parse and follow Nth degree outlinks > * NUTCH-1214 DomainStats tool should be named for what it's doing > * NUTCH-1207 ParserChecker to output signature > * NUTCH-1174 Outlinks are not properly normalized > * NUTCH-1173 DomainStats doesn't count db_not_modified > * NUTCH-1142 Normalization and filtering in WebGraph > PORTED: > * No issues yet > NOT GOING TO BE PORTED: > * No issues, explain why it should not be ported -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira