lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McGibbney, Lewis John" <Lewis.McGibb...@gcu.ac.uk>
Subject RE: Indexing from Nutch crawl
Date Mon, 18 Apr 2011 12:03:53 GMT
Hi Markus,

hadoop.log from beginning of solr commands as follows

2011-04-18 11:27:05,480 INFO  solr.SolrIndexer - SolrIndexer: starting at 2011-04-18 11:27:05
2011-04-18 11:27:05,562 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
2011-04-18 11:27:05,562 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
2011-04-18 11:27:05,562 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
crawl/segments/20110418111549
2011-04-18 11:27:05,656 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
crawl/segments/20110418111603
2011-04-18 11:27:05,660 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
crawl/segments/20110418112359
2011-04-18 11:27:05,661 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
crawl/segments/20110418112526
2011-04-18 11:27:06,065 WARN  util.NativeCodeLoader - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
2011-04-18 11:27:06,282 INFO  plugin.PluginRepository - Plugins: looking in: /home/lewis/branch-1.3/runtime/local/plugins
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository - Registered Plugins:
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         the nutch core extension points
(nutch-extensionpoints)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Basic URL Normalizer (urlnormalizer-basic)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Html Parse Plug-in (parse-html)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Basic Indexing Filter (index-basic)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         HTTP Framework (lib-http)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Pass-through URL Normalizer
(urlnormalizer-pass)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Regex URL Filter (urlfilter-regex)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Http Protocol Plug-in (protocol-http)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Regex URL Normalizer (urlnormalizer-regex)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Tika Parser Plug-in (parse-tika)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         OPIC Scoring Plug-in (scoring-opic)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         CyberNeko HTML Parser (lib-nekohtml)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Anchor Indexing Filter (index-anchor)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Regex URL Filter Framework
(lib-regex-filter)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository - Registered Extension-Points:
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch Protocol (org.apache.nutch.protocol.Protocol)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch Segment Merge Filter
(org.apache.nutch.segment.SegmentMergeFilter)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch URL Filter (org.apache.nutch.net.URLFilter)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch Content Parser (org.apache.nutch.parse.Parser)
2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2011-04-18 11:27:06,399 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,401 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:06,571 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,571 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:06,727 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,727 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:06,890 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:06,890 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,085 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,085 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,287 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,288 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,531 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,531 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,754 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,754 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:07,949 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:07,949 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,150 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,151 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,427 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,428 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,644 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,644 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:08,853 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:08,855 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,055 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,055 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,279 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,279 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,492 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,494 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,699 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,699 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,904 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,905 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:09,966 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-04-18 11:27:09,966 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: content dest: content
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: site dest: site
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: title dest: title
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: host dest: host
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: segment dest: segment
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: boost dest: boost
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: digest dest: digest
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: url dest: id
2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: url dest: url
2011-04-18 11:27:10,394 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75)
        at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-04-18 11:27:11,033 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2011-04-18 11:27:11,869 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at
2011-04-18 11:27:11
2011-04-18 11:27:11,870 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url:
http://localhost:8080/wombra/data
2011-04-18 11:27:13,048 INFO  solr.SolrClean - SolrClean: starting at 2011-04-18 11:27:13
2011-04-18 11:27:13,888 INFO  solr.SolrClean - SolrClean: deleting 5 documents
2011-04-18 11:27:13,992 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.nutch.indexer.solr.SolrClean$SolrDeleter.close(SolrClean.java:115)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)


________________________________________
From: Markus Jelsma [markus.jelsma@openindex.io]
Sent: 18 April 2011 11:59
To: solr-user@lucene.apache.org
Cc: McGibbney, Lewis John
Subject: Re: Indexing from Nutch crawl

Can you include hadoop.log output? Likely the other commands fail as well but
don't write the exception to stdout.


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and
Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the
Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Mime
View raw message