nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NUTCH-1486) Upgrade to Solr 4.10.2
Date Mon, 03 Aug 2015 22:26:05 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652610#comment-14652610
] 

Lewis John McGibbney edited comment on NUTCH-1486 at 8/3/15 10:25 PM:
----------------------------------------------------------------------

Patch for trunk. This patch touches a couple of places.
* corrects classes within log4j.properties to indexwriter for SolrWriter
* removes schema-solr4.xml and moves all required fields over to schema.xml
* removes the bastard additional dependencies from ivy/ivy.xml (cf. NUTCH-2056, NUTCH-2058)
and adds them to the parsefilter-naivebayes. Also upgrades the Mahout and Lucene API's along
with the accompanying dependencies to play nicely with Lucene and Solr 4.10.2. Finally implements
the correct plugins.xml runtime dependencies for this plugin as well.
* Removes the transitive dependency for org.apache.httpcomponents httpcore and httpclient
within index-geoip. These dependencies were leading to hellish classpath issues due to newer
implementations being used elsewhere. Also upgrades index-geoip dependency to 2.3.1. Implements
the correct plugin.xml runtime dependencies.
* Introduces some new properties within nutch-default.xml which enable us to choose between
HttpSolrServer, CloudSolrServer, ConcurrentSolrServer or LBSolrServer. These have been documented
within nutch-site.xml and also within the describe() function of SolrWriter.
* upgraded use of httpclient and httpcore across the board to >= 4.3.1 meaning that we
avoid classpath issues when indexing and building custom plugins on top of Nutch which implement
newer interfaces for these dependencies. 

[~asitang] can you please test out this patch along with the parsefilter-naivebayes? I want
to confirm that it works similar/same to what you expect from your trained models.

@ everyone else, I've tested this indexing into Elasticsearch 1.5.0 and Apache Solr 4.10.2
and all is good. It would be very much appreciated if people could test before this patch
diverges too much from trunk.


was (Author: lewismc):
Patch for trunk. This patch touches a couple of places.
* corrects classes within log4j.properties to indexwriter for SolrWriter
* removes schema-solr4.xml and moves all required fields over to schema.xml
* removes the bastard additional dependencies from ivy/ivy.xml (cf. NUTCH-2056, NUTCH-2058)
and adds them to the parsefilter-naivebayes. Also upgrades the Mahout and Lucene API's along
with the accompanying dependencies to play nicely with Lucene and Solr 4.10.2. Finally implements
the correct plugins.xml runtime dependencies for this plugin as well.
* Removes the transitive dependency for org.apache.httpcomponents httpcore and httpclient
within index-geoip. These dependencies were leading to hellish classpath issues due to newer
implementations being used elsewhere. Also upgrades index-geoip dependency to 2.3.1. Implements
the correct plugin.xml runtime dependencies.
* Introduces some new properties within nutch-default.xml which enable us to choose between
HttpSolrServer, CloudSolrServer, ConcurrentSolrServer or LBSolrServer. These have been documented
within nutch-site.xml and also within the describe() function of SolrWriter.
* upgraded use of httpclient and httpcore across the board to >= 4.3.1 meaning that we
avoid classpath issues when indexing and building custom plugins on top of Nutch which implement
newer interfaces for these dependencies. 

[~asitang] can you please test out this patch along with the parsefilter-naivebayes? I want
to confirm that it works similar/same to what you expect from your trained models.

@ everyone else, I've tested this indexing into Elasticsearch 1.5.0 and Apache Solr 4.10.2
and all is good. It would be very much appreciated if people could test before this patch
diverges too much from trunk.
* removed 
* 

> Upgrade to Solr 4.10.2
> ----------------------
>
>                 Key: NUTCH-1486
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1486
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.6, 2.1
>         Environment: Solr 4.0, Nutch trunk 1.6-SNAPSHOT & Probably 2.2-SNAPHOT
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: memex
>             Fix For: 1.11
>
>         Attachments: NUTCH-1486-1.8.patch, NUTCH-1486-1.9-trunk.patch, NUTCH-1486-2.x-v3.patch,
NUTCH-1486-2.x.patch, NUTCH-1486-2.x.v2.patch, NUTCH-1486-nutchgora.patch, NUTCH-1486-trunk.patch,
NUTCH-1486-trunk.v2.patch, NUTCH-1486-trunk.v3.patch, NUTCH-1486-trunkv4.patch
>
>
> When attempting to configure a 4 multicore 4.0 instance with Nutch schema-solr4.xml file,
I get the following exceptions.
> This has been discussed previously. As I see it we have two options
> 1. Keep maintaining both schema options
> 2. Ditch the more complex schema-solr4.xml in favour of vanilla schema.xml
> Thoughts?
> {code}
> SEVERE: Unable to create core: collection4
> org.apache.solr.common.SolrException: Unable to use updateLog: _version_field must exist
in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not
exist)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> 	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> 	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> 	at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> 	at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> 	at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> 	at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> 	at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> 	at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> 	at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> 	at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> 	at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> 	at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> 	at org.eclipse.jetty.start.Main.start(Main.java:602)
> 	at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_
does not exist)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> 	at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> 	... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema,
using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> 	at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> 	... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_
does not exist)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> 	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> 	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> 	at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> 	at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> 	at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> 	at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> 	at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> 	at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> 	at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> 	at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> 	at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> 	at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> 	at org.eclipse.jetty.start.Main.start(Main.java:602)
> 	at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_
does not exist)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> 	at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> 	... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema,
using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> 	at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> 	... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: user.dir=/home/lewis/ASF/solr/example
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init() done
> 2012-11-01 16:26:15.228:INFO:oejs.AbstractConnector:Started SocketConnector@0.0.0.0:8983
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message