tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Allen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1165) Autodetect and parse Asciidoc
Date Fri, 13 Jun 2014 09:12:02 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030410#comment-14030410

Dan Allen commented on TIKA-1165:

I strongly recommend using AsciidoctorJ, as you have suggested. AsciidoctorJ is the official
and most comprehensive way of processing AsciiDoc on the JVM.


AsciidoctorJ uses JRuby to invoke Asciidoctor. You can either use the bundled JRuby runtime
or you can configure it to use an alternate JRuby runtime.

AsciidoctorJ is used by both GitBlit and Bintray to render AsciiDoc documents. Feel free to
reach out to those teams if you want feedback about how it performs.

> Autodetect and parse Asciidoc
> -----------------------------
>                 Key: TIKA-1165
>                 URL: https://issues.apache.org/jira/browse/TIKA-1165
>             Project: Tika
>          Issue Type: Wish
>          Components: languageidentifier, parser
>    Affects Versions: 1.4
>            Reporter: David Pilato
>            Priority: Trivial
> When parsing asciidoc metadata, we currently get the following:
> {noformat}
> Content-Encoding: ISO-8859-1
> Content-Length: 66363
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: asciidoc.adoc
> {noformat}
> Steps to reproduce:
> {code:title=asciidoc.sh|borderStyle=solid}
> curl https://raw.github.com/asciidoctor/asciidoctor.org/master/docs/asciidoc-syntax-quick-reference.adoc
-O -s
> java -jar tika-app-1.4.jar -m asciidoc-syntax-quick-reference.adoc
> {code}

This message was sent by Atlassian JIRA

View raw message