nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Asitang Mishra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1486) Upgrade to Solr 4.10.2
Date Wed, 19 Aug 2015 16:32:46 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703291#comment-14703291
] 

Asitang Mishra commented on NUTCH-1486:
---------------------------------------

Hey Lewis,
Your fix for the jar soup did not work for the Naive bayes plugin. It was not able to find
classes. Here is what I got:

java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
	at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:340)
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
	... 9 more
2015-08-19 09:27:41,936 ERROR naivebayes.NaiveBayesParseFilter - Error occured while training::
java.lang.IllegalStateException: Job failed!
	at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95)
	at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
	at org.apache.nutch.parsefilter.naivebayes.NaiveBayesClassifier.createModel(NaiveBayesClassifier.java:99)
	at org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter.train(NaiveBayesParseFilter.java:93)
	at org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter.setConf(NaiveBayesParseFilter.java:148)
	at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
	at org.apache.nutch.plugin.PluginRepository.getOrderedPlugins(PluginRepository.java:441)
	at org.apache.nutch.parse.HtmlParseFilters.<init>(HtmlParseFilters.java:35)
	at org.apache.nutch.parse.html.HtmlParser.setConf(HtmlParser.java:343)
	at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
	at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:136)
	at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
	at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:104)
	at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:46)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)



> Upgrade to Solr 4.10.2
> ----------------------
>
>                 Key: NUTCH-1486
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1486
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.6, 2.1
>         Environment: Solr 4.0, Nutch trunk 1.6-SNAPSHOT & Probably 2.2-SNAPHOT
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: memex
>             Fix For: 1.11
>
>         Attachments: NUTCH-1486-1.8.patch, NUTCH-1486-1.9-trunk.patch, NUTCH-1486-2.x-v3.patch,
NUTCH-1486-2.x.patch, NUTCH-1486-2.x.v2.patch, NUTCH-1486-nutchgora.patch, NUTCH-1486-trunk.patch,
NUTCH-1486-trunk.v2.patch, NUTCH-1486-trunk.v3.patch, NUTCH-1486-trunkv4.patch, NUTCH-1486-trunkv5.patch
>
>
> When attempting to configure a 4 multicore 4.0 instance with Nutch schema-solr4.xml file,
I get the following exceptions.
> This has been discussed previously. As I see it we have two options
> 1. Keep maintaining both schema options
> 2. Ditch the more complex schema-solr4.xml in favour of vanilla schema.xml
> Thoughts?
> {code}
> SEVERE: Unable to create core: collection4
> org.apache.solr.common.SolrException: Unable to use updateLog: _version_field must exist
in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not
exist)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> 	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> 	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> 	at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> 	at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> 	at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> 	at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> 	at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> 	at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> 	at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> 	at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> 	at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> 	at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> 	at org.eclipse.jetty.start.Main.start(Main.java:602)
> 	at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_
does not exist)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> 	at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> 	... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema,
using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> 	at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> 	... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_
does not exist)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> 	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> 	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> 	at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> 	at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> 	at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> 	at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> 	at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> 	at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> 	at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> 	at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> 	at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> 	at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> 	at org.eclipse.jetty.start.Main.start(Main.java:602)
> 	at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_
does not exist)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> 	at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> 	... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema,
using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> 	at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> 	... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: user.dir=/home/lewis/ASF/solr/example
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init() done
> 2012-11-01 16:26:15.228:INFO:oejs.AbstractConnector:Started SocketConnector@0.0.0.0:8983
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message