tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2636) ENVI Header metadata fields can span more than one line
Date Tue, 01 May 2018 20:28:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460114#comment-16460114
] 

Hudson commented on TIKA-2636:
------------------------------

UNSTABLE: Integrated in Jenkins build Tika-trunk #1477 (See [https://builds.apache.org/job/Tika-trunk/1477/])
TIKA-2636 ENVI Header metadata fields can span more than one line (lewis.mcgibbney: [https://github.com/apache/tika/commit/ceb7b42ba2e342e7becb81d0c661ccd6209a915e])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java
* (add) tika-parsers/src/test/resources/test-documents/ang20150420t182050_corr_v1e_img.hdr
* (edit) tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (add) output.txt
TIKA-2636 ENVI Header metadata fields can span more than one line (lewis.mcgibbney: [https://github.com/apache/tika/commit/1fae340976e054bc8206cf79dff8b33758eebe82])
* (delete) output.txt
TIKA-2636 ENVI Header metadata fields can span more than one line (lewis.mcgibbney: [https://github.com/apache/tika/commit/d2c412940b976e607b698b598e25481495d0b8e4])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
TIKA-2636 ENVI Header metadata fields can span more than one line (lewis.mcgibbney: [https://github.com/apache/tika/commit/fb4e39323b1d0576ea8066065febae94765a96c2])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java


> ENVI Header metadata fields can span more than one line
> -------------------------------------------------------
>
>                 Key: TIKA-2636
>                 URL: https://issues.apache.org/jira/browse/TIKA-2636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.18
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.19
>
>         Attachments: ang20150420t182050_corr_v1e_img.hdr
>
>
> [~tpalsulich] was correct when [he stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140]
"...See below for how to read and output line by line (copy & paste between the xml start/end
in EnviHeaderParser). I have a hunch this isn't really what we want -- what if a metadata
field has a newline in it? What if the line is too long to fit into a string? On the other
hand, with nice input, it's much nicer output."
> As it turns out ENVI header metadata fields can span more than one line. An example is
as follows
> {code}
> 1.    ENVI
> 2.    description = {
> 3.      Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] [Wed
> 4.      Jun 10 04:48:52 2015]}
> 5.    samples = 739
> 6.    lines = 14674
> 7.    bands = 432
> 8.    header offset = 0
> 9.    file type = ENVI Standard
> 10.    data type = 4
> 11.    interleave = bil
> 12.    sensor type = Unknown
> 13.    byte order = 0
> 14.    map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 1.1000000000e+00
, 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , rotation=75.00000000 }
> 15.    wavelength units = Nanometers
> ...
> {code}
> The case here is when a metadata field value is contained within curly brackets. The
examples above are clearly L2-L4 where the value is spread over three lines and L14 where
the value is contained within the one line.
> This requires a patch to fix the [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message