tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: NER Parser tests behind proxy?
Date Tue, 24 Nov 2015 15:19:53 GMT
Gotcha Tim, OK that helps.

Thamme, can you try and test this behind a proxy so that we can
try and replicate what Tim is seeing?

As for packaging the models, Stanford NER may be difficult to do
that, not only b/c of the license (GPLv3 [1], which is why we did it
as a runtime dependency, and optional, since we also did Apache
OpenNLP), but b/c of the size of the models. Apache OpenNLP models
are there and freely available, but no Maven packaging exists
for them.

We’ll get this figured out Tim.

Cheers,
Chris

[1] http://nlp.stanford.edu/software/corenlp.shtml

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: "Allison, Timothy B." <tallison@mitre.org>
Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
Date: Tuesday, November 24, 2015 at 6:07 AM
To: "dev@tika.apache.org" <dev@tika.apache.org>
Cc: ThammeGowda Narayanaswamy <thammegowda.n@usc.edu>
Subject: RE: NER Parser tests behind proxy?

>Y, you do, but you (or I) can set the proxy for Maven correctly and
>(without the NER requirement) the build works fine.
>
>***WARNING, what I'm running into might very well just be user error in
>not telling Maven to pass the proxy info to Groovy...this is why I didn't
>open an issue :) I've done some googling, but haven't found an answer to
>this.***
>
>In response to Thamme's questions:
>>> Which is better?
>>> 1. List 'access to opennlp.sourceforge.net' as a requirement
>I have access without a problem via regular means, the problem is that
>Maven isn't passing proxy information into Groovy when it tries to make
>the call to get the document (I confirmed this by dumping system props
>within ModelGetter).  Perhaps we just document that you need to download
>the four model files manually and stick them in the right subdirectory if
>you are behind a proxy (ugly solution, but would probably work)?
>
>
>>>2. Package and deploy models as a maven artifact
>Are there licensing issues for the current models?  Are the current
>models ASLv2.0?  Would we need all four full models?  And, y, my
>suggestion was to build a very small model and push it to source control
>in the resources directory.
>
>All this said, 1) again, this could be user error and 2) the addition of
>Stanford NER is fantastic...Thank you for this addition!
>
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Monday, November 23, 2015 11:12 AM
>To: dev@tika.apache.org
>Cc: ThammeGowda Narayanaswamy <thammegowda.n@usc.edu>
>Subject: Re: NER Parser tests behind proxy?
>
>Hey Tim,
>
>Why shouldn’t we have to worry
>about connectivity outside of the Maven stuff? I mean clearly, if I
>install Tika on a new system today without a Maven repo, I must be
>connected to the internet, right?
>
>Cheers,
>Chris
>
>
>
>-----Original Message-----
>From: "Allison, Timothy B." <tallison@mitre.org>
>Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
>Date: Monday, November 23, 2015 at 8:03 AM
>To: "dev@tika.apache.org" <dev@tika.apache.org>
>Cc: ThammeGowda Narayanaswamy <thammegowda.n@usc.edu>
>Subject: RE: NER Parser tests behind proxy?
>
>>The problem comes down to: ModelGetter.groovy which is trying to grab:
>>${basedir}/src/test/resources/org/apache/tika/parser/ner/opennlp/ner-pe
>>rso
>>n.bin
>>
>>If we could build a small model (and I mean really small) and package
>>it with Tika, we wouldn't have to worry about http connectivity outside
>>of the usual maven stuff.
>>
>>-----Original Message-----
>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>Sent: Monday, November 23, 2015 10:52 AM
>>To: dev@tika.apache.org
>>Cc: ThammeGowda Narayanaswamy <thammegowda.n@usc.edu>
>>Subject: Re: NER Parser tests behind proxy?
>>
>>Hey Tim,
>>
>>I’m not seeing these of course b/c I’m not behind a proxy. Thamme, any
>>ideas?
>>
>>Cheers,
>>Chris
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: chris.a.mattmann@nasa.gov
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department University of
>>Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: "Allison, Timothy B." <tallison@mitre.org>
>>Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
>>Date: Thursday, November 19, 2015 at 5:36 PM
>>To: "dev@tika.apache.org" <dev@tika.apache.org>
>>Subject: NER Parser tests behind proxy?
>>
>>>My proxy is configured for git/maven/etc, but how do I configure it
>>>within the test so that I don't get this?
>>>
>>>GET : http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin ->
>>>tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner
>>>-
>>>per
>>>son.bin
>>>[INFO]
>>>----------------------------------------------------------------------
>>>-
>>>-
>>>[INFO] Reactor Summary:
>>>[INFO]
>>>[INFO] Apache Tika parent ................................ SUCCESS
>>>[3.264s] [INFO] Apache Tika core ..................................
>>>SUCCESS [44.470s] [INFO] Apache Tika parsers
>>>............................... FAILURE [1:56.462s] [INFO] Apache Tika
>>>XMP ................................... SKIPPED [INFO] Apache Tika
>>>serialization ......................... SKIPPED [INFO] Apache Tika
>>>batch ................................. SKIPPED [INFO] Apache Tika
>>>application ........................... SKIPPED [INFO] Apache Tika
>>>OSGi bundle ........................... SKIPPED [INFO] Apache Tika
>>>translate ............................. SKIPPED [INFO] Apache Tika
>>>server ................................ SKIPPED [INFO] Apache Tika
>>>examples .............................. SKIPPED [INFO] Apache Tika
>>>Java-7 Components ..................... SKIPPED [INFO] Apache Tika
>>>....................................... SKIPPED [INFO]
>>>----------------------------------------------------------------------
>>>-
>>>-
>>>[INFO] BUILD FAILURE
>>>[INFO]
>>>----------------------------------------------------------------------
>>>-
>>>-
>>>[INFO] Total time: 2:45.245s
>>>[INFO] Finished at: Thu Nov 19 20:29:34 EST 2015 [INFO] Final Memory:
>>>52M/482M [INFO]
>>>----------------------------------------------------------------------
>>>-
>>>-
>>>[ERROR] Failed to execute goal
>>>org.codehaus.groovy.maven:gmaven-plugin:1.0:execute (testSetup) on
>>>project tika-parsers: java.net.ConnectException: Connection refused:
>>>connect -> [Help 1]
>>>org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
>>>execute goal org.codehaus.groovy.maven:gmaven-plugin:1.0:execute
>>>(testSetup) on project tika-parsers: java.net.ConnectException:
>>>Connection refused:
>>>connect
>>>	at
>>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.
>>>j
>>>ava
>>>:217)
>>>	at
>>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.
>>>j
>>>ava
>>>:153)
>>>	at
>>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.
>>>j
>>>ava
>>>:145)
>>>	at
>>>org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProjec
>>>t
>>>(Li
>>>fecycleModuleBuilder.java:84)
>>>	at
>>>org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProjec
>>>t
>>>(Li
>>>fecycleModuleBuilder.java:59)
>>>	at
>>>org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBui
>>>l
>>>d(L
>>>ifecycleStarter.java:183)
>>>	at
>>>org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycle
>>>S
>>>tar
>>>ter.java:161)
>>>	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
>>>	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>>>	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>>>	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>>>	at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>>>	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>	at
>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>>>ava
>>>:
>>>62)
>>>	at
>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>>>o
>>>rIm
>>>pl.java:43)
>>>	at java.lang.reflect.Method.invoke(Method.java:497)
>>>	at
>>>org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launc
>>>her
>>>.
>>>java:290)
>>>	at
>>>org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:
>>>230
>>>)
>>>	at
>>>org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Lau
>>>n
>>>che
>>>r.java:409)
>>>	at
>>>org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352
>>>)
>>>	at org.codehaus.classworlds.Launcher.main(Launcher.java:47)
>>>	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>	at
>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>>>ava
>>>:
>>>62)
>>>	at
>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>>>o
>>>rIm
>>>pl.java:43)
>>>	at java.lang.reflect.Method.invoke(Method.java:497)
>>>	at
>>>com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>>>Caused by: org.apache.maven.plugin.MojoExecutionException:
>>>java.net.ConnectException: Connection refused: connect
>>>	at
>>>org.codehaus.groovy.maven.plugin.MojoSupport.execute(MojoSupport.java:85
>>>)
>>>	at
>>>org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultB
>>>u
>>>ild
>>>PluginManager.java:101)
>>>	at
>>>org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.
>>>j
>>>ava
>>>:209)
>>>	... 25 more
>>>Caused by: org.codehaus.groovy.maven.feature.ComponentException:
>>>java.net.ConnectException: Connection refused: connect
>>>	at
>>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.invoke
>>>M
>>>eth
>>>od(ScriptExecutorSupport.java:162)
>>>	at
>>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.execut
>>>e
>>>(Sc
>>>riptExecutorSupport.java:126)
>>>	at
>>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.execut
>>>e
>>>(Sc
>>>riptExecutorSupport.java:73)
>>>	at
>>>org.codehaus.groovy.maven.plugin.execute.ExecuteMojo.process(ExecuteMo
>>>j
>>>o.j
>>>ava:249)
>>>	at
>>>org.codehaus.groovy.maven.plugin.ComponentMojoSupport.doExecute(Compon
>>>e
>>>ntM
>>>ojoSupport.java:60)
>>>	at
>>>org.codehaus.groovy.maven.plugin.MojoSupport.execute(MojoSupport.java:69
>>>)
>>>	... 27 more
>>>Caused by: java.net.ConnectException: Connection refused: connect
>>>	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>Method)
>>>	at
>>>sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructo
>>>r
>>>Acc
>>>essorImpl.java:62)
>>>	at
>>>sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCo
>>>n
>>>str
>>>uctorAccessorImpl.java:45)
>>>	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.j
>>>ava
>>>:
>>>1890)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.j
>>>ava
>>>:
>>>1885)
>>>	at java.security.AccessController.doPrivileged(Native Method)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpUR
>>>L
>>>Con
>>>nection.java:1884)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLCon
>>>n
>>>ect
>>>ion.java:1457)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConn
>>>e
>>>cti
>>>on.java:1441)
>>>	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>	at
>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>>>ava
>>>:
>>>62)
>>>	at
>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>>>o
>>>rIm
>>>pl.java:43)
>>>	at java.lang.reflect.Method.invoke(Method.java:497)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMeth
>>>o
>>>dSi
>>>teNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaM
>>>e
>>>tho
>>>dSite.java:52)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSit
>>>e
>>>Arr
>>>ay.java:43)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCal
>>>l
>>>Sit
>>>e.java:116)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCal
>>>l
>>>Sit
>>>e.java:120)
>>>	at ModelGetter.downloadFile(ModelGetter.groovy:64)
>>>	at ModelGetter$downloadFile.callCurrent(Unknown Source)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(
>>>C
>>>all
>>>SiteArray.java:47)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(Abst
>>>r
>>>act
>>>CallSite.java:142)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(Abst
>>>r
>>>act
>>>CallSite.java:154)
>>>	at ModelGetter.run(ModelGetter.groovy:91)
>>>	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>	at
>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>>>ava
>>>:
>>>62)
>>>	at
>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>>>o
>>>rIm
>>>pl.java:43)
>>>	at java.lang.reflect.Method.invoke(Method.java:497)
>>>	at
>>>org.codehaus.groovy.maven.runtime.support.ScriptExecutorSupport.invoke
>>>M
>>>eth
>>>od(ScriptExecutorSupport.java:158)
>>>	... 32 more
>>>Caused by: java.net.ConnectException: Connection refused: connect
>>>	at java.net.DualStackPlainSocketImpl.connect0(Native Method)
>>>	at
>>>java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketIm
>>>p
>>>l.j
>>>ava:79)
>>>	at
>>>java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav
>>>a
>>>:35
>>>0)
>>>	at
>>>java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI
>>>mpl
>>>.
>>>java:206)
>>>	at
>>>java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
>>>188
>>>)
>>>	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
>>>	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>	at java.net.Socket.connect(Socket.java:589)
>>>	at java.net.Socket.connect(Socket.java:538)
>>>	at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
>>>	at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>>>	at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>>>	at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
>>>	at sun.net.www.http.HttpClient.New(HttpClient.java:308)
>>>	at sun.net.www.http.HttpClient.New(HttpClient.java:326)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLCo
>>>n
>>>nec
>>>tion.java:1169)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConne
>>>c
>>>tio
>>>n.java:1105)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnec
>>>t
>>>ion
>>>.java:999)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.
>>>j
>>>ava
>>>:933)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLCon
>>>n
>>>ect
>>>ion.java:1513)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConn
>>>e
>>>cti
>>>on.java:1441)
>>>	at
>>>sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConn
>>>e
>>>cti
>>>on.java:2943)
>>>	at java.net.URLConnection.getHeaderFieldLong(URLConnection.java:629)
>>>	at java.net.URLConnection.getContentLengthLong(URLConnection.java:501)
>>>	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>	at
>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>>>ava
>>>:
>>>62)
>>>	at
>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>>>o
>>>rIm
>>>pl.java:43)
>>>	at java.lang.reflect.Method.invoke(Method.java:497)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMeth
>>>o
>>>dSi
>>>teNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaM
>>>e
>>>tho
>>>dSite.java:52)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSit
>>>e
>>>Arr
>>>ay.java:43)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCal
>>>l
>>>Sit
>>>e.java:116)
>>>	at
>>>org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCal
>>>l
>>>Sit
>>>e.java:120)
>>>	at ModelGetter.downloadFile(ModelGetter.groovy:61)
>>>	... 42 more
>>>
>>>-----Original Message-----
>>>From: Nick Burch [mailto:apache@gagravarr.org]
>>>Sent: Thursday, November 19, 2015 7:41 PM
>>>To: dev@tika.apache.org
>>>Subject: Re: [DISCUSS] Moving to Git
>>>
>>>On Thu, 19 Nov 2015, Mattmann, Chris A (3980) wrote:
>>>> I’ll be happy to update our docs and to write a wiki page on using
>>>> Tika & Git that we can refer folks to. I think I’ve demonstrated
>>>> documenting things on the Tika wiki :)
>>>
>>>Great stuff! Scribble something sensible down, and I can vote +1 to
>>>the move, plus learn more about Git at the same time :)
>>>
>>>Nick
>>
>

Mime
View raw message