tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Allen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1165) Autodetect and parse Asciidoc
Date Fri, 13 Jun 2014 09:12:02 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030410#comment-14030410
] 

Dan Allen commented on TIKA-1165:
---------------------------------

I strongly recommend using AsciidoctorJ, as you have suggested. AsciidoctorJ is the official
and most comprehensive way of processing AsciiDoc on the JVM.

http://asciidoctor.org/docs/install-and-use-asciidoctor-java-integration/
https://github.com/asciidoctor/asciidoctorj

AsciidoctorJ uses JRuby to invoke Asciidoctor. You can either use the bundled JRuby runtime
or you can configure it to use an alternate JRuby runtime.

AsciidoctorJ is used by both GitBlit and Bintray to render AsciiDoc documents. Feel free to
reach out to those teams if you want feedback about how it performs.

> Autodetect and parse Asciidoc
> -----------------------------
>
>                 Key: TIKA-1165
>                 URL: https://issues.apache.org/jira/browse/TIKA-1165
>             Project: Tika
>          Issue Type: Wish
>          Components: languageidentifier, parser
>    Affects Versions: 1.4
>            Reporter: David Pilato
>            Priority: Trivial
>
> When parsing asciidoc metadata, we currently get the following:
> {noformat}
> Content-Encoding: ISO-8859-1
> Content-Length: 66363
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: asciidoc.adoc
> {noformat}
> Steps to reproduce:
> {code:title=asciidoc.sh|borderStyle=solid}
> curl https://raw.github.com/asciidoctor/asciidoctor.org/master/docs/asciidoc-syntax-quick-reference.adoc
-O -s
> java -jar tika-app-1.4.jar -m asciidoc-syntax-quick-reference.adoc
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message