Hey Max,
Let's replace it with CXF like I suggested -- it works fine. I'll take a pass at it (we used
Apache CXF
to replace Jersey in Apache OODT).
What do you think?
Cheers,
Chris
On Aug 2, 2011, at 4:45 AM, <maxcom@apache.org> <maxcom@apache.org> wrote:
> Author: maxcom
> Date: Tue Aug 2 11:45:50 2011
> New Revision: 1153097
>
> URL: http://svn.apache.org/viewvc?rev=1153097&view=rev
> Log:
> TIKA-593: Update tika-server
>
> 1) Synchronize codebase with our production code. This fixes few bugs and has cleaner
code
>
> 2) Update Jersey dependency to latest version (1.0.3 -> 1.7), port code to JAX-RS
1.1
>
> 3) Replace Grizzly web server with Jetty, that is available at maven central repository.
>
> Tika-server still requires java.net's Jersey artifact, I'm goint to try to replace Jersey
with
> Apache (Incubator) Wink
>
>
> Added:
> tika/trunk/tika-server/src/main/resources/tikaserver-version.properties
> Modified:
> tika/trunk/tika-server/pom.xml
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/MetadataResource.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/PartExtractor.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaExceptionMapper.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/UnpackerResource.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipOutput.java
> tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipUtils.java
> tika/trunk/tika-server/src/main/resources/commons-logging.properties
> tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java
> tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java
> tika/trunk/tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java
>
> Modified: tika/trunk/tika-server/pom.xml
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/pom.xml?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/pom.xml (original)
> +++ tika/trunk/tika-server/pom.xml Tue Aug 2 11:45:50 2011
> @@ -1,3 +1,20 @@
> +<!--
> + ~ Licensed to the Apache Software Foundation (ASF) under one or more
> + ~ contributor license agreements. See the NOTICE file distributed with
> + ~ this work for additional information regarding copyright ownership.
> + ~ The ASF licenses this file to You under the Apache License, Version 2.0
> + ~ (the "License"); you may not use this file except in compliance with
> + ~ the License. You may obtain a copy of the License at
> + ~
> + ~ http://www.apache.org/licenses/LICENSE-2.0
> + ~
> + ~ Unless required by applicable law or agreed to in writing, software
> + ~ distributed under the License is distributed on an "AS IS" BASIS,
> + ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + ~ See the License for the specific language governing permissions and
> + ~ limitations under the License.
> + -->
> +
> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
>
> <!--
> @@ -41,33 +58,38 @@
> <dependency>
> <groupId>com.sun.jersey</groupId>
> <artifactId>jersey-server</artifactId>
> - <version>1.0.3.1</version>
> + <version>1.7</version>
> </dependency>
> <dependency>
> <groupId>com.sun.jersey</groupId>
> <artifactId>jersey-core</artifactId>
> - <version>1.0.3.1</version>
> + <version>1.7</version>
> </dependency>
> <dependency>
> - <groupId>com.sun.jersey</groupId>
> - <artifactId>jersey-client</artifactId>
> - <version>1.0.3.1</version>
> - </dependency>
> - <dependency>
> <groupId>javax.ws.rs</groupId>
> <artifactId>jsr311-api</artifactId>
> - <version>1.0</version>
> + <version>1.1</version>
> </dependency>
> <dependency>
> - <groupId>com.sun.jersey.test.framework</groupId>
> - <artifactId>jersey-test-framework</artifactId>
> - <version>1.0.3.1</version>
> + <groupId>com.sun.jersey.jersey-test-framework</groupId>
> + <artifactId>jersey-test-framework-grizzly</artifactId>
> + <version>1.7</version>
> <scope>test</scope>
> </dependency>
> <dependency>
> - <groupId>com.sun.grizzly</groupId>
> - <artifactId>grizzly-servlet-webserver</artifactId>
> - <version>1.9.8</version>
> + <groupId>org.eclipse.jetty</groupId>
> + <artifactId>jetty-server</artifactId>
> + <version>8.0.0.M3</version>
> + </dependency>
> + <dependency>
> + <groupId>org.eclipse.jetty</groupId>
> + <artifactId>jetty-servlet</artifactId>
> + <version>8.0.0.M3</version>
> + </dependency>
> + <dependency>
> + <groupId>org.slf4j</groupId>
> + <artifactId>slf4j-jdk14</artifactId>
> + <version>1.6.1</version>
> </dependency>
> <dependency>
> <groupId>commons-logging</groupId>
> @@ -155,8 +177,8 @@
> <plugin>
> <groupId>org.apache.maven.plugins</groupId>
> <artifactId>maven-surefire-plugin</artifactId>
> + <version>2.6</version>
> <configuration>
> - <version>2.6</version>
> <redirectTestOutputToFile>true</redirectTestOutputToFile>
> <argLine>-da -XX:+HeapDumpOnOutOfMemoryError -Xmx512m</argLine>
> <!-- <argLine>-agentlib:jprofilerti=port=8849 -Xbootclasspath/a:/arc/opt/jprofiler5/bin/agent.jar</argLine>
-->
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/MetadataResource.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/MetadataResource.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/MetadataResource.java
(original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/MetadataResource.java
Tue Aug 2 11:45:50 2011
> @@ -14,17 +14,14 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import au.com.bytecode.opencsv.CSVWriter;
> +import org.apache.commons.logging.Log;
> +import org.apache.commons.logging.LogFactory;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.AutoDetectParser;
> -import org.apache.tika.parser.Parser;
> -import org.apache.tika.parser.ParseContext;
> -import org.apache.tika.config.TikaConfig;
> -import org.apache.tika.detect.Detector;
> -import org.apache.tika.mime.MediaType;
> -import org.apache.tika.mime.MimeTypeException;
> import org.xml.sax.helpers.DefaultHandler;
>
> import javax.ws.rs.PUT;
> @@ -34,30 +31,27 @@ import javax.ws.rs.WebApplicationExcepti
> import javax.ws.rs.core.Context;
> import javax.ws.rs.core.HttpHeaders;
> import javax.ws.rs.core.StreamingOutput;
> +import javax.ws.rs.core.UriInfo;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.OutputStream;
> import java.io.OutputStreamWriter;
> import java.util.ArrayList;
> import java.util.Arrays;
> -import java.util.List;
>
> -@Path("/meta")
> +@Path("/meta{id:(/.*)?}")
> public class MetadataResource {
> - private static final String CONTENT_LENGTH = "Content-Length";
> - private static final String FILE_NNAME = "File-Name";
> - private static final String RESOURCE_NAME = "resourceName";
> + private static final Log logger = LogFactory.getLog(MetadataResource.class);
>
> @PUT
> @Produces("text/csv")
> - public StreamingOutput getMetadata( InputStream is, @Context HttpHeaders httpHeaders
) throws Exception {
> - final Detector detector = new HeaderTrustingDetectorFactory ().createDetector( httpHeaders
);
> - final AutoDetectParser parser = new AutoDetectParser(detector);
> - final ParseContext context = new ParseContext();
> - context.set(Parser.class, parser);
> + public StreamingOutput getMetadata(InputStream is, @Context HttpHeaders httpHeaders,
@Context UriInfo info) throws Exception {
> final Metadata metadata = new Metadata();
> - parser.parse( is, new DefaultHandler(), metadata, context );
> - fillMetadata ( httpHeaders, metadata );
> + AutoDetectParser parser = TikaResource.createParser();
> + TikaResource.fillMetadata(parser, metadata, httpHeaders);
> + TikaResource.logRequest(logger, info, metadata);
> +
> + parser.parse(is, new DefaultHandler(), metadata);
>
> return new StreamingOutput() {
> public void write(OutputStream outputStream) throws IOException, WebApplicationException
{
> @@ -73,26 +67,4 @@ public class MetadataResource {
> }
> };
> }
> -
> - private void fillMetadata ( HttpHeaders httpHeaders, Metadata metadata ) {
> - final List < String > fileName = httpHeaders.getRequestHeader(FILE_NNAME),
cl = httpHeaders.getRequestHeader(CONTENT_LENGTH);
> - if ( cl != null && !cl.isEmpty() )
> - metadata.set( CONTENT_LENGTH, cl.get(0) );
> -
> - if ( fileName != null && !fileName.isEmpty() )
> - metadata.set( RESOURCE_NAME, fileName.get(0) );
> - }
> -
> - private static class HeaderTrustingDetectorFactory {
> - public Detector createDetector( HttpHeaders httpHeaders ) throws IOException, MimeTypeException
{
> - final javax.ws.rs.core.MediaType mediaType = httpHeaders.getMediaType();
> - if (mediaType == null || mediaType.equals(javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM_TYPE
))
> - return (new TikaConfig()).getMimeRepository();
> - else return new Detector() {
> - public MediaType detect(InputStream inputStream, Metadata metadata) throws IOException
{
> - return MediaType.parse( mediaType.toString() );
> - }
> - };
> - }
> - }
> }
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/PartExtractor.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/PartExtractor.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/PartExtractor.java (original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/PartExtractor.java Tue
Aug 2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import java.io.IOException;
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaExceptionMapper.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaExceptionMapper.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaExceptionMapper.java
(original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaExceptionMapper.java
Tue Aug 2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import org.apache.tika.exception.TikaException;
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource.java (original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource.java Tue
Aug 2 11:45:50 2011
> @@ -14,38 +14,48 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
> import org.apache.poi.EncryptedDocumentException;
> +import org.apache.poi.extractor.ExtractorFactory;
> import org.apache.poi.hwpf.OldWordFileFormatException;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> +import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.Metadata;
> +import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> -import org.apache.tika.parser.AbstractParser;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> +import org.apache.tika.parser.Parser;
> +import org.apache.tika.parser.html.HtmlParser;
> import org.apache.tika.sax.BodyContentHandler;
> +import org.apache.tika.sax.WriteOutContentHandler;
> +import org.xml.sax.Attributes;
> import org.xml.sax.ContentHandler;
> import org.xml.sax.SAXException;
>
> import javax.ws.rs.*;
> -import javax.ws.rs.core.Context;
> -import javax.ws.rs.core.HttpHeaders;
> -import javax.ws.rs.core.Response;
> -import javax.ws.rs.core.StreamingOutput;
> +import javax.ws.rs.core.*;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.OutputStream;
> +import java.util.List;
> +import java.util.Map;
> import java.util.Set;
>
> -@Path("/tika")
> +@Path("/tika{id:(/.*)?}")
> public class TikaResource {
> public static final String GREETING = "This is Tika Server. Please PUT\n";
> private final Log logger = LogFactory.getLog(TikaResource.class);
>
> + static {
> + ExtractorFactory.setAllThreadsPreferEventExtractors(true);
> + }
> +
> @SuppressWarnings({"SameReturnValue"})
> @GET
> @Produces("text/plain")
> @@ -56,7 +66,11 @@ public class TikaResource {
> public static AutoDetectParser createParser() {
> final AutoDetectParser parser = new AutoDetectParser();
>
> - parser.setFallback(new AbstractParser() {
> + Map<MediaType,Parser> parsers = parser.getParsers();
> + parsers.put(MediaType.APPLICATION_XML, new HtmlParser());
> + parser.setParsers(parsers);
> +
> + parser.setFallback(new Parser() {
> public Set<MediaType> getSupportedTypes(ParseContext parseContext) {
> return parser.getSupportedTypes(parseContext);
> }
> @@ -64,15 +78,31 @@ public class TikaResource {
> public void parse(InputStream inputStream, ContentHandler contentHandler, Metadata
metadata, ParseContext parseContext) {
> throw new WebApplicationException(Response.Status.UNSUPPORTED_MEDIA_TYPE);
> }
> +
> + public void parse(InputStream inputStream, ContentHandler contentHandler, Metadata
metadata) {
> + throw new WebApplicationException(Response.Status.UNSUPPORTED_MEDIA_TYPE);
> + }
> });
>
> return parser;
> }
>
> public static void fillMetadata(AutoDetectParser parser, Metadata metadata, HttpHeaders
httpHeaders) {
> + List<String> fileName = httpHeaders.getRequestHeader("File-Name");
> + if (fileName!=null && !fileName.isEmpty()) {
> + metadata.set(TikaMetadataKeys.RESOURCE_NAME_KEY, fileName.get(0));
> + }
> +
> javax.ws.rs.core.MediaType mediaType = httpHeaders.getMediaType();
> + if (mediaType!=null && "xml".equals(mediaType.getSubtype()) ) {
> + mediaType = null;
> + }
> +
> + if (mediaType !=null && mediaType.equals(javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM_TYPE))
{
> + mediaType = null;
> + }
>
> - if (mediaType !=null && !mediaType.equals(javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM_TYPE))
{
> + if (mediaType !=null) {
> metadata.add(org.apache.tika.metadata.HttpHeaders.CONTENT_TYPE, mediaType.toString());
>
> final Detector detector = parser.getDetector();
> @@ -94,18 +124,41 @@ public class TikaResource {
> @PUT
> @Consumes("*/*")
> @Produces("text/plain")
> - public StreamingOutput getText(final InputStream is, @Context HttpHeaders httpHeaders)
{
> + public StreamingOutput getText(final InputStream is, @Context HttpHeaders httpHeaders,
@Context final UriInfo info) {
> final AutoDetectParser parser = createParser();
> final Metadata metadata = new Metadata();
>
> fillMetadata(parser, metadata, httpHeaders);
>
> + logRequest(logger, info, metadata);
> +
> return new StreamingOutput() {
> public void write(OutputStream outputStream) throws IOException, WebApplicationException
{
> - BodyContentHandler body = new BodyContentHandler(outputStream);
> + BodyContentHandler body = new BodyContentHandler(new WriteOutContentHandler(outputStream)
{
> + @Override
> + public void startElement(String uri, String localName, String qName, Attributes
attributes) throws SAXException {
> + super.startElement(uri, localName, qName, attributes);
> +
> + if ("img".equals(localName) && attributes.getValue("alt")!=null)
{
> + String nfo = "[image: "+attributes.getValue("alt")+ ']';
> +
> + characters(nfo.toCharArray(), 0, nfo.length());
> + }
> +
> + if ("a".equals(localName) && attributes.getValue("name")!=null)
{
> + String nfo = "[bookmark: "+attributes.getValue("name")+ ']';
> +
> + characters(nfo.toCharArray(), 0, nfo.length());
> + }
> + }
> + });
> +
> + TikaInputStream tis = TikaInputStream.get(is);
>
> try {
> - parser.parse(is, body, metadata);
> + tis.getFile();
> +
> + parser.parse(tis, body, metadata);
> } catch (SAXException e) {
> throw new WebApplicationException(e);
> } catch (TikaException e) {
> @@ -125,11 +178,31 @@ public class TikaResource {
> throw new WebApplicationException(Response.status(422).build());
> }
>
> - logger.warn("Text extraction failed", e);
> + logger.warn(String.format(
> + "%s: Text extraction failed",
> + info.getPath()
> + ), e);
>
> throw new WebApplicationException(Response.Status.INTERNAL_SERVER_ERROR);
> + } finally {
> + tis.close();
> }
> }
> };
> }
> +
> + public static void logRequest(Log logger, UriInfo info, Metadata metadata) {
> + if (metadata.get(org.apache.tika.metadata.HttpHeaders.CONTENT_TYPE)==null) {
> + logger.info(String.format(
> + "%s (autodetecting type)",
> + info.getPath()
> + ));
> + } else {
> + logger.info(String.format(
> + "%s (%s)",
> + info.getPath(),
> + metadata.get(org.apache.tika.metadata.HttpHeaders.CONTENT_TYPE)
> + ));
> + }
> + }
> }
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java (original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java Tue
Aug 2 11:45:50 2011
> @@ -14,42 +14,45 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> -import com.sun.grizzly.http.SelectorThread;
> -import com.sun.jersey.api.container.grizzly.GrizzlyWebContainerFactory;
> +import com.sun.jersey.api.core.PackagesResourceConfig;
> +import com.sun.jersey.spi.container.servlet.ServletContainer;
> import org.apache.commons.cli.*;
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
> +import org.eclipse.jetty.server.Server;
> +import org.eclipse.jetty.servlet.ServletContextHandler;
> +import org.eclipse.jetty.servlet.ServletHolder;
>
> -import javax.ws.rs.core.UriBuilder;
> -import java.net.URI;
> -import java.util.HashMap;
> -import java.util.Map;
> +import java.io.IOException;
> +import java.util.Properties;
>
> public class TikaServerCli {
> private static final Log logger = LogFactory.getLog(TikaServerCli.class);
> -
> public static final int DEFAULT_PORT = 9998;
>
> private static Options getOptions() {
> Options options = new Options();
> options.addOption("p", "port", true, "listen port (default = "+DEFAULT_PORT+ ')');
> -
> options.addOption("h", "help", false, "this help message");
>
> return options;
> }
>
> public static void main(String[] args) {
> + Properties properties = new Properties();
> try {
> - TikaServerCli cli = new TikaServerCli();
> -
> - Map<String, String> params = new HashMap<String, String>();
> + properties.load(ClassLoader.getSystemResourceAsStream("tikaserver-version.properties"));
> + } catch (IOException e) {
> + throw new RuntimeException(e);
> + }
>
> - params.put("com.sun.jersey.config.property.packages", "org.apache.tika.server");
> + logger.info("Starting Tikaserver "+properties.getProperty("tikaserver.version"));
>
> - Options options = cli.getOptions();
> + try {
> + Options options = getOptions();
>
> CommandLineParser cliParser = new GnuParser();
> CommandLine line = cliParser.parse(options, args);
> @@ -59,18 +62,24 @@ public class TikaServerCli {
> if (line.hasOption("port")) {
> port = Integer.valueOf(line.getOptionValue("port"));
> }
> -
> if (line.hasOption("help")) {
> HelpFormatter helpFormatter = new HelpFormatter();
> helpFormatter.printHelp("tikaserver", options);
> System.exit(-1);
> }
>
> - String baseUri = "http://localhost/";
> - URI buildUri = UriBuilder.fromUri(baseUri).port(port).build();
> - SelectorThread threadSelector = GrizzlyWebContainerFactory.create(buildUri, params);
> + Server server = new Server(port);
> + ServletContextHandler context = new ServletContextHandler(ServletContextHandler.NO_SESSIONS);
> + context.setContextPath("/");
> + server.setHandler(context);
> +
> + context.addServlet(new ServletHolder(new ServletContainer(new PackagesResourceConfig("org.apache.tika.server"))),
"/*");
> +
> + server.start();
> +
> + logger.info("Started");
>
> - logger.info("Started at " + buildUri);
> + server.join();
> } catch (Exception ex) {
> logger.fatal("Can't start", ex);
> System.exit(-1);
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/UnpackerResource.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/UnpackerResource.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/UnpackerResource.java
(original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/UnpackerResource.java
Tue Aug 2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import org.apache.commons.lang.mutable.MutableInt;
> @@ -24,13 +25,13 @@ import org.apache.poi.poifs.filesystem.O
> import org.apache.poi.poifs.filesystem.POIFSFileSystem;
> import org.apache.poi.util.IOUtils;
> import org.apache.tika.config.TikaConfig;
> +import org.apache.tika.exception.TikaException;
> import org.apache.tika.extractor.EmbeddedDocumentExtractor;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> -import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeTypeException;
> +import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> -import org.apache.tika.parser.Parser;
> import org.apache.tika.parser.microsoft.OfficeParser;
> import org.xml.sax.ContentHandler;
> import org.xml.sax.SAXException;
> @@ -40,15 +41,15 @@ import javax.ws.rs.PUT;
> import javax.ws.rs.Path;
> import javax.ws.rs.Produces;
> import javax.ws.rs.WebApplicationException;
> -import javax.ws.rs.core.Context;
> -import javax.ws.rs.core.HttpHeaders;
> -import javax.ws.rs.core.Response;
> -import javax.ws.rs.core.StreamingOutput;
> -import java.io.*;
> +import javax.ws.rs.core.*;
> +import java.io.ByteArrayInputStream;
> +import java.io.ByteArrayOutputStream;
> +import java.io.IOException;
> +import java.io.InputStream;
> import java.util.Collections;
> import java.util.zip.ZipOutputStream;
>
> -@Path("/unpacker")
> +@Path("/unpacker{id:(/.*)?}")
> public class UnpackerResource {
> private static final Log logger = LogFactory.getLog(UnpackerResource.class);
>
> @@ -62,25 +63,15 @@ public class UnpackerResource {
> @Produces("application/zip")
> public StreamingOutput getText(
> InputStream is,
> - @Context HttpHeaders httpHeaders
> + @Context HttpHeaders httpHeaders,
> + @Context UriInfo info
> ) throws Exception {
> - if (!is.markSupported()) {
> - is = new BufferedInputStream(is);
> - }
> -
> - Parser parser;
> + Metadata metadata = new Metadata();
>
> - javax.ws.rs.core.MediaType mediaType = httpHeaders.getMediaType();
> - if (mediaType !=null && !mediaType.equals(javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM_TYPE))
{
> - parser = tikaConfig.getParser(new MediaType(httpHeaders.getMediaType().getType(),
httpHeaders.getMediaType().getSubtype()));
> - } else {
> - MediaType type = tikaConfig.getMimeRepository().detect(is, new Metadata());
> - parser = tikaConfig.getParser(type);
> - }
> + AutoDetectParser parser = TikaResource.createParser();
>
> - if (parser==null) {
> - throw new WebApplicationException(Response.Status.UNSUPPORTED_MEDIA_TYPE);
> - }
> + TikaResource.fillMetadata(parser, metadata, httpHeaders);
> + TikaResource.logRequest(logger, info, metadata);
>
> ContentHandler ch = new DefaultHandler();
>
> @@ -91,9 +82,16 @@ public class UnpackerResource {
>
> pc.set(EmbeddedDocumentExtractor.class, new MyEmbeddedDocumentExtractor(count, zout));
>
> - parser.parse(is, ch, new Metadata(), pc);
> + try {
> + parser.parse(is, ch, metadata, pc);
> + } catch (TikaException ex) {
> + logger.warn(String.format(
> + "%s: Unpacker failed",
> + info.getPath()
> + ), ex);
> + }
>
> - if (count.intValue()==0) {
> + if (count.intValue() == 0) {
> throw new WebApplicationException(Response.Status.NO_CONTENT);
> }
>
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipOutput.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipOutput.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipOutput.java (original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipOutput.java Tue Aug
2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import javax.ws.rs.WebApplicationException;
>
> Modified: tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipUtils.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipUtils.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipUtils.java (original)
> +++ tika/trunk/tika-server/src/main/java/org/apache/tika/server/ZipUtils.java Tue Aug
2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import java.io.IOException;
>
> Modified: tika/trunk/tika-server/src/main/resources/commons-logging.properties
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/resources/commons-logging.properties?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/main/resources/commons-logging.properties (original)
> +++ tika/trunk/tika-server/src/main/resources/commons-logging.properties Tue Aug 2 11:45:50
2011
> @@ -1,3 +1,4 @@
> +#
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements. See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> @@ -12,4 +13,5 @@
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> +#
> org.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger
>
> Added: tika/trunk/tika-server/src/main/resources/tikaserver-version.properties
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/main/resources/tikaserver-version.properties?rev=1153097&view=auto
> ==============================================================================
> --- tika/trunk/tika-server/src/main/resources/tikaserver-version.properties (added)
> +++ tika/trunk/tika-server/src/main/resources/tikaserver-version.properties Tue Aug
2 11:45:50 2011
> @@ -0,0 +1,18 @@
> +#
> +# Licensed to the Apache Software Foundation (ASF) under one or more
> +# contributor license agreements. See the NOTICE file distributed with
> +# this work for additional information regarding copyright ownership.
> +# The ASF licenses this file to You under the Apache License, Version 2.0
> +# (the "License"); you may not use this file except in compliance with
> +# the License. You may obtain a copy of the License at
> +#
> +# http://www.apache.org/licenses/LICENSE-2.0
> +#
> +# Unless required by applicable law or agreed to in writing, software
> +# distributed under the License is distributed on an "AS IS" BASIS,
> +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +# See the License for the specific language governing permissions and
> +# limitations under the License.
> +#
> +
> +tikaserver.version=${project.version}
>
> Modified: tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java
(original)
> +++ tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java
Tue Aug 2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import au.com.bytecode.opencsv.CSVReader;
> @@ -36,7 +37,7 @@ public class MetadataResourceTest extend
> @Test
> public void testSimpleWord() throws Exception {
> Reader reader =
> - webResource.path(META_PATH)
> + resource().path(META_PATH)
> .type("application/msword")
> .put(Reader.class, ClassLoader.getSystemResourceAsStream(TikaResourceTest.TEST_DOC));
>
>
> Modified: tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java
(original)
> +++ tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java
Tue Aug 2 11:45:50 2011
> @@ -14,6 +14,7 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> import com.sun.jersey.api.client.ClientResponse;
> @@ -38,14 +39,14 @@ public class TikaResourceTest extends Je
> */
> @Test
> public void testHelloWorld() {
> - String responseMsg = webResource.path(TIKA_PATH).get(String.class);
> + String responseMsg = resource().path(TIKA_PATH).get(String.class);
> assertEquals(TikaResource.GREETING, responseMsg);
> }
>
> @Test
> public void testSimpleWord() {
> String responseMsg =
> - webResource.path(TIKA_PATH)
> + resource().path(TIKA_PATH)
> .type("application/msword")
> .put(String.class, ClassLoader.getSystemResourceAsStream(TEST_DOC));
>
> @@ -54,7 +55,7 @@ public class TikaResourceTest extends Je
>
> @Test
> public void testApplicationWadl() {
> - String serviceWadl = webResource.path("application.wadl").
> + String serviceWadl = resource().path("application.wadl").
> accept(MediaTypes.WADL).get(String.class);
>
> assertTrue(serviceWadl.length() > 0);
> @@ -63,7 +64,7 @@ public class TikaResourceTest extends Je
> @Test
> public void testPasswordXLS() throws Exception {
> ClientResponse cr =
> - webResource
> + resource()
> .path(TIKA_PATH)
> .type("application/vnd.ms-excel")
> .put(ClientResponse.class, ClassLoader.getSystemResourceAsStream("password.xls"));
>
> Modified: tika/trunk/tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java
> URL: http://svn.apache.org/viewvc/tika/trunk/tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java?rev=1153097&r1=1153096&r2=1153097&view=diff
> ==============================================================================
> --- tika/trunk/tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java
(original)
> +++ tika/trunk/tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java
Tue Aug 2 11:45:50 2011
> @@ -14,9 +14,9 @@
> * See the License for the specific language governing permissions and
> * limitations under the License.
> */
> +
> package org.apache.tika.server;
>
> -import com.sun.jersey.api.client.ClientResponse;
> import com.sun.jersey.test.framework.JerseyTest;
> import org.apache.commons.codec.digest.DigestUtils;
> import org.apache.tika.io.IOUtils;
> @@ -29,8 +29,6 @@ import java.util.zip.ZipEntry;
> import java.util.zip.ZipInputStream;
>
> import static org.junit.Assert.assertEquals;
> -import static org.junit.Assert.assertNotNull;
> -import static org.junit.Assert.assertTrue;
>
> public class UnpackerResourceTest extends JerseyTest {
> private static final String UNPACKER_PATH = "/unpacker";
> @@ -71,7 +69,7 @@ public class UnpackerResourceTest extend
> @Test
> public void testDocWAV() throws Exception {
> InputStream is =
> - webResource
> + resource()
> .path(UNPACKER_PATH)
> .type(APPLICATION_MSWORD)
> .put(InputStream.class, ClassLoader.getSystemResourceAsStream(TEST_DOC_WAV));
> @@ -87,7 +85,7 @@ public class UnpackerResourceTest extend
> @Test
> public void testDocPicture() throws Exception {
> InputStream is =
> - webResource
> + resource()
> .path(UNPACKER_PATH)
> .type(APPLICATION_MSWORD)
> .put(InputStream.class, ClassLoader.getSystemResourceAsStream(TEST_DOC_WAV));
> @@ -102,7 +100,7 @@ public class UnpackerResourceTest extend
> @Test
> public void testDocPictureNoOle() throws Exception {
> InputStream is =
> - webResource
> + resource()
> .path(UNPACKER_PATH)
> .type(APPLICATION_MSWORD)
> .put(InputStream.class, ClassLoader.getSystemResourceAsStream("2pic.doc"));
> @@ -117,7 +115,7 @@ public class UnpackerResourceTest extend
> @Test
> public void testImageDOCX() throws Exception {
> InputStream is =
> - webResource
> + resource()
> .path(UNPACKER_PATH)
> .put(InputStream.class, ClassLoader.getSystemResourceAsStream(TEST_DOCX_IMAGE));
>
> @@ -133,7 +131,7 @@ public class UnpackerResourceTest extend
> public void testExeDOCX() throws Exception {
> String TEST_DOCX_EXE = "2exe.docx";
> InputStream is =
> - webResource
> + resource()
> .path(UNPACKER_PATH)
> .put(InputStream.class, ClassLoader.getSystemResourceAsStream(TEST_DOCX_EXE));
>
> @@ -163,7 +161,7 @@ public class UnpackerResourceTest extend
> @Test
> public void testImageXSL() throws Exception {
> InputStream is =
> - webResource
> + resource()
> .path(UNPACKER_PATH)
> .put(InputStream.class, ClassLoader.getSystemResourceAsStream("pic.xls"));
>
>
>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|