tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 122jxgcn <ywpar...@gmail.com>
Subject Custom parser error
Date Tue, 31 Jul 2012 08:48:25 GMT
Hi, I'm continuing my question from 
http://lucene.472066.n3.nabble.com/Convert-file-before-Tika-processes-it-td3990629.html
this post

So, I wrote some code and test, but it's not passing

On the test, I did something like


InputStream stream = HWPParserTest.class.getResourceAsStream(
        "/test-documents/testHWP.hwp");
try {
        parser.parse(stream, handler, metadata, context);
} finally {
        stream.close();
}


And my parser looks like


public void parse(
                        InputStream stream, ContentHandler handler,
                        Metadata metadata, ParseContext context)
                        throws IOException, SAXException, TikaException {
                
  try {
      TikaInputStream tstream = TikaInputStream.cast(stream);
                    
      if (tstream != null && tstream.hasFile()) {
          File f = tstream.getFile();
          Process ps = Runtime.getRuntime().exec("/hwp2xml.bin", null, f);
          new XMLParser().parse(ps.getInputStream(), handler, metadata,
context);
      }
  } finally {
      stream.close();
  }

  metadata.set(Metadata.CONTENT_TYPE, HWP_MIME_TYPE);
                
  XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
  xhtml.startDocument();
  xhtml.endDocument();
}


Based on my findings, it seems that casting InputStream into TikaInputStream
is failing.
So tstream variable becomes null, which results in error.
I'm not sure what's going wrong in here as made my parser similar to the
PDF's
Any help please? 

Also, I'm not sure whether

File f = tstream.getFile();
Process ps = Runtime.getRuntime().exec("/hwp2xml.bin", null, f);
new XMLParser().parse(ps.getInputStream(), handler, metadata, context);

I wrote this part correctly...



--
View this message in context: http://lucene.472066.n3.nabble.com/Custom-parser-error-tp3998302.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Mime
View raw message