[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071643#comment-14071643
]
Hong-Thai Nguyen commented on TIKA-1373:
----------------------------------------
Can you format your description with {code} annotation and if I understand well the output
of 1st section is empty ?
> AutoDetectParser extracts no text when SourceCodeParser is selected
> -------------------------------------------------------------------
>
> Key: TIKA-1373
> URL: https://issues.apache.org/jira/browse/TIKA-1373
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.5
> Reporter: Andrés Aguilar-Umaña
>
> When using the AutoDetectParser in java code, and the SourceCodeParser is selected (i.e.
java files), the handler gets no text:
> I have this test program:
> String data = "public class HelloWorld {}";
> ByteArrayInputStream bais = new ByteArrayInputStream(data.getBytes());
> Parser autoDetectParser = new AutoDetectParser();
> autoDetectParser = new SourceCodeParser();
> BodyContentHandler bch = new BodyContentHandler(50);
> ParseContext parseContext = new ParseContext();
> Metadata metadata = new Metadata();
> metadata.set(Metadata.CONTENT_TYPE, "text/x-java-source");
> try {
> autoDetectParser.parse(bais, bch, metadata, parseContext);
> } catch (Exception e) {
> e.printStackTrace();
> }
> System.out.println("Text extracted: "+bch.toString())
> It returns (using the SourceCodeParser):
> > Text extracted:
> But when I use this code:
> String data = "public class HelloWorld {}";
> ByteArrayInputStream bais = new ByteArrayInputStream(data.getBytes());
> Parser autoDetectParser = new AutoDetectParser();
> autoDetectParser = new SourceCodeParser();
> BodyContentHandler bch = new BodyContentHandler(50);
> ParseContext parseContext = new ParseContext();
> Metadata metadata = new Metadata();
> metadata.set(Metadata.CONTENT_TYPE, "text/plain");
> try { autoDetectParser.parse(bais, bch, metadata, parseContext); } catch (Exception
e) { e.printStackTrace(); }
> System.out.println("Text extracted: "+bch.toString())
> The Text Parser is used and I get:
> > Text extracted: public class HelloWorld {}
> I have also tested this command:
> > java -jar tika-app-1.5.jar -t D:\text.java
> (no text)
> >
--
This message was sent by Atlassian JIRA
(v6.2#6252)
|