tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Meikle (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2630) Wrong height and width metadata for JPEG images
Date Tue, 30 Oct 2018 00:10:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667918#comment-16667918
] 

Dave Meikle commented on TIKA-2630:
-----------------------------------

After writing it, I know it really wont given the class of metadata keys between the Exif
directories.

Wondering if we could short term just add the directory name in as a key qualifier for just
Exif information, given it is there where this is an issue just now.

Will create a proposed pull request and see what others think.

> Wrong height and width metadata for JPEG images
> -----------------------------------------------
>
>                 Key: TIKA-2630
>                 URL: https://issues.apache.org/jira/browse/TIKA-2630
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Ancuta Morarasu
>            Assignee: Dave Meikle
>            Priority: Major
>         Attachments: Tika-metadata.txt, metadata-exctractor-metadata.txt, sizesampleissue.jpg
>
>
> According to [Exif specs|http://www.exif.org/Exif2-2.PDF#page=73&zoom=auto,-176,103],
for compressed images the values for width and height should come from the tags:
> * *PixelXDimension* mapped in metadata-extractor to {{com.drew.metadata.Directory.ExifDirectoryBase.TAG_EXIF_IMAGE_WIDTH}}
and
> * *PixelYDimension* mapped to {{ExifDirectoryBase.TAG_EXIF_IMAGE_HEIGHT}}.
> {{ImageMetadataExtractor$ExifHandler.[handlePhotoTags(...)|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java#L487]}}
should extract and set these in the metadata:
> {code:java}
>  if (directory.containsTag(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)) {
>     metadata.set(Metadata.IMAGE_WIDTH,
>                  trimPixels(directory.getDescription(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)));
>   }
>   if (directory.containsTag(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)) {
>       metadata.set(Metadata.IMAGE_LENGTH,
>                    trimPixels(directory.getDescription(ExifSubIFDDirectory.TAG_EXIF_IMAGE_HEIGHT)));
>    }
> {code}
> Also the {{CopyUnknownFieldsHandler}} overrides the values for "Image Width" ({{JpegDirectory.TAG_IMAGE_WIDTH}})
and "Image Height" ({{JpegDirectory.TAG_IMAGE_HEIGHT}}) with the values from {{ExifIFD0Descriptor.TAG_IMAGE_WIDTH}}
and {{ExifIFD0Descriptor.TAG_IMAGE_HEIGHT}} because they have the same tag name.
> I attached a sample image, these are the metadata values:
> * extracted by metadata-extractor:
> [JPEG] Image Height = 367 pixels
> [JPEG] Image Width = 1535 pixels
> [Exif IFD0] Image Width = 2173 pixels
> [Exif IFD0] Image Height = 520 pixels
> [Exif SubIFD] Exif Image Width = 1535 pixels
> [Exif SubIFD] Exif Image Height = 367 pixels
> * Tika metadata:
> Image Height: 520 pixels
> Image Width: 2173 pixels
> tiff:ImageLength: 520
> tiff:ImageWidth: 2173
> Exif Image Height: 367 pixels
> Exif Image Width: 1535 pixels



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message