lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Indexing from Nutch crawl
Date Mon, 18 Apr 2011 12:21:58 GMT
And you are really sure there's a Solr instance runnning having an update 
handler at : http://localhost:8080/wombra/data/update ? Anyway, your URL is 
somewhat uncommon in Solr land. It's usually something like:

http://<host>:<port>/solr/[<core>]/update/

On Monday 18 April 2011 14:03:53 McGibbney, Lewis John wrote:
> Hi Markus,
> 
> hadoop.log from beginning of solr commands as follows
> 
> 2011-04-18 11:27:05,480 INFO  solr.SolrIndexer - SolrIndexer: starting at
> 2011-04-18 11:27:05 2011-04-18 11:27:05,562 INFO  indexer.IndexerMapReduce
> - IndexerMapReduce: crawldb: crawl/crawldb 2011-04-18 11:27:05,562 INFO 
> indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
> 2011-04-18 11:27:05,562 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418111549
> 2011-04-18 11:27:05,656 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418111603
> 2011-04-18 11:27:05,660 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418112359
> 2011-04-18 11:27:05,661 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawl/segments/20110418112526
> 2011-04-18 11:27:06,065 WARN  util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes
> where applicable 2011-04-18 11:27:06,282 INFO  plugin.PluginRepository -
> Plugins: looking in: /home/lewis/branch-1.3/runtime/local/plugins
> 2011-04-18 11:27:06,396 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true] 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository - Registered Plugins: 2011-04-18 11:27:06,396 INFO
>  plugin.PluginRepository -         the nutch core extension points
> (nutch-extensionpoints) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Basic URL Normalizer
> (urlnormalizer-basic) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Html Parse Plug-in (parse-html)
> 2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Basic
> Indexing Filter (index-basic) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         HTTP Framework (lib-http) 2011-04-18
> 11:27:06,396 INFO  plugin.PluginRepository -         Pass-through URL
> Normalizer (urlnormalizer-pass) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Regex URL Filter (urlfilter-regex)
> 2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Http
> Protocol Plug-in (protocol-http) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Regex URL Normalizer
> (urlnormalizer-regex) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Tika Parser Plug-in (parse-tika)
> 2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         OPIC
> Scoring Plug-in (scoring-opic) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         CyberNeko HTML Parser (lib-nekohtml)
> 2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -         Anchor
> Indexing Filter (index-anchor) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Regex URL Filter Framework
> (lib-regex-filter) 2011-04-18 11:27:06,396 INFO  plugin.PluginRepository -
> Registered Extension-Points: 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Nutch Protocol
> (org.apache.nutch.protocol.Protocol) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Nutch Segment Merge Filter
> (org.apache.nutch.segment.SegmentMergeFilter) 2011-04-18 11:27:06,396 INFO
>  plugin.PluginRepository -         Nutch URL Filter
> (org.apache.nutch.net.URLFilter) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Nutch Content Parser
> (org.apache.nutch.parse.Parser) 2011-04-18 11:27:06,396 INFO 
> plugin.PluginRepository -         Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter) 2011-04-18 11:27:06,399 INFO 
> indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,401
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:06,571 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,571
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:06,727 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,727
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:06,890 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:06,890
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,085 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,085
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,287 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,288
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,531 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,531
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,754 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,754
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:07,949 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:07,949
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,150 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,151
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,427 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,428
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,644 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,644
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:08,853 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:08,855
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,055 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,055
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,279 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,279
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,492 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,494
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,699 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,699
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,904 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,905
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:09,966 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-04-18 11:27:09,966
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-04-18
> 11:27:10,021 INFO  solr.SolrMappingReader - source: content dest: content
> 2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: site dest:
> site 2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: title
> dest: title 2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source:
> host dest: host 2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader -
> source: segment dest: segment 2011-04-18 11:27:10,021 INFO 
> solr.SolrMappingReader - source: boost dest: boost 2011-04-18 11:27:10,021
> INFO  solr.SolrMappingReader - source: digest dest: digest 2011-04-18
> 11:27:10,021 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: url dest:
> id 2011-04-18 11:27:10,021 INFO  solr.SolrMappingReader - source: url
> dest: url 2011-04-18 11:27:10,394 WARN  mapred.LocalJobRunner -
> job_local_0001 org.apache.solr.common.SolrException: Not Found
> 
> Not Found
> 
> request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
>         at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:435) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:244) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.j
> ava:48) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-04-18 11:27:11,033 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed! 2011-04-18 11:27:11,869 INFO  solr.SolrDeleteDuplicates -
> SolrDeleteDuplicates: starting at 2011-04-18 11:27:11 2011-04-18
> 11:27:11,870 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr
> url: http://localhost:8080/wombra/data 2011-04-18 11:27:13,048 INFO 
> solr.SolrClean - SolrClean: starting at 2011-04-18 11:27:13 2011-04-18
> 11:27:13,888 INFO  solr.SolrClean - SolrClean: deleting 5 documents
> 2011-04-18 11:27:13,992 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Not Found
> 
> Not Found
> 
> request: http://localhost:8080/wombra/data/update?wt=javabin&version=1
>         at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:435) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:244) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.nutch.indexer.solr.SolrClean$SolrDeleter.close(SolrClean.java:1
> 15) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 
> 
> ________________________________________
> From: Markus Jelsma [markus.jelsma@openindex.io]
> Sent: 18 April 2011 11:59
> To: solr-user@lucene.apache.org
> Cc: McGibbney, Lewis John
> Subject: Re: Indexing from Nutch crawl
> 
> Can you include hadoop.log output? Likely the other commands fail as well
> but don't write the exception to stdout.
> 
> 
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Mime
View raw message