tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (TIKA-1425) Automatic batching of Microsoft service calls
Date Tue, 03 Feb 2015 05:46:35 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lewis John McGibbney reassigned TIKA-1425:
------------------------------------------

    Assignee: Lewis John McGibbney

> Automatic batching of Microsoft service calls
> ---------------------------------------------
>
>                 Key: TIKA-1425
>                 URL: https://issues.apache.org/jira/browse/TIKA-1425
>             Project: Tika
>          Issue Type: Improvement
>          Components: translation
>    Affects Versions: 1.6
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.8
>
>
> Right now when I use the following code I get the stack trace at the bottom of this description.
This seems to be because the Request URI is too large to make the service request. We need
to have a mechansim within the call to Tika.translate which will, on a service-by-service
basis, determine the maximum Request URI which can be sent. I beleive that this should be
on the Tika side as how else am I meant to know the maximum request size?
> {code:title=translator.java|borderStyle=solid}
> +    Translator translate = new MicrosoftTranslator();
> +    ((MicrosoftTranslator) translate).setId("...");
> +    ((MicrosoftTranslator) translate).setSecret("...");
>      for (java.util.Map.Entry<Text, Parse> entry : parseResult) {
>        Parse parse = entry.getValue();
>        LOG.info("---------\nUrl\n---------------\n");
> @@ -201,7 +207,7 @@
>        System.out.print(parse.getData().toString());
>        if (dumpText) {
>          LOG.info("---------\nParseText\n---------\n");
> -        System.out.print(parse.getText());
> +        System.out.print(translate.translate(parse.getText(), "fr"));
>        }
> {code}
> {code:title=stacktrace.log|borderStyle=solid}
> Exception in thread "main" java.lang.Exception: [microsoft-translator-api] Error retrieving
translation : Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0...
> ...
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:202)
> 	at com.memetix.mst.translate.Translate.execute(Translate.java:61)
> 	at com.memetix.mst.translate.Translate.execute(Translate.java:76)
> 	at org.apache.tika.language.translate.MicrosoftTranslator.translate(MicrosoftTranslator.java:104)
> 	at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:210)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:228)
> Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE%D1%80%D1%83%D0%B...
> ...
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)
> 	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671)
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244)
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:178)
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:199)
> 	... 6 more
> Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE...
> ...
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
> 	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
> 	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:177)
> 	... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message