tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olof Jonasson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1054) Problem with parsing excel date formats
Date Thu, 10 Jan 2013 13:14:12 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549593#comment-13549593
] 

Olof Jonasson commented on TIKA-1054:
-------------------------------------

I tried to switch the locale on my computer and just as you said the Excel file now shows
the dates in the format they got indexed with.
Thanks for the information.
                
> Problem with parsing excel date formats
> ---------------------------------------
>
>                 Key: TIKA-1054
>                 URL: https://issues.apache.org/jira/browse/TIKA-1054
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.2
>            Reporter: Olof Jonasson
>
> I'm using solr4.0 and tika1.2 and get some problems with indexing excel files containing
date formats. I've read TIKA-125, TIKA-371, TIKA-103 and TIKA-360 and there I get the impression
that the date formatting problem is solved (for some cases at least).
> I've used testEXCEL-formats.xls from TIKA-103 and also resaved it as xlsx and tested
that as well. Default locale on my computer is swedish. This is what I get (sorry for the
occasional swedish):
> Content of testEXCEL-formats.xlsx and testEXCEL-formats.xls
> Number #,##0.00 1 599,99 -1 599,99
> Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
> Scientific 0.00E+00 1,98E+08 -1,98E+08
> Percentage (0.025) 3% 2,50%
> Fraction (2.5) 2 1/2
> Time Format: h:mm AM/PM 6:15 AM 6:15 PM
> Time Format: h:mm 06:15 18:15
> Date Format: m/d/yy 2009-10-03
> Date Format: d-mmm-yy 17-maj-07
> Date/Time Format 2008-01-19 04:35
> Custom Number: 19 dollars and ,99 cents
> Custom Date: At 4:20 AM on torsdag maj 17, 2007
> What the tika1.2 parser returns for the xlsx (and is indexed by solr)
> Number #,##0.00 1 599,99 -1 599,99
> Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
> Scientific 0.00E+00 1,98E+08 -1,98E+08
> Percentage (0.025) 3% 2,50%
> Fraction (2.5) 2 1/2
> Time Format: h:mm AM/PM 6:15 fm 6:15 em
> Time Format: h:mm 6:15 18:15
> Date Format: m/d/yy 2009/10/03
> Date Format: d-mmm-yy 17-maj-07
> Date/Time Format 1/19/08 4:35
> Custom Number: 19,99 dollars and cents
> Custom Date: 39219.18056369212 
> What the tika1.2 parser returns for the xls (and is indexed by solr)
> Number #,##0.00  1 599,99 -1 599,99
> Currency $#,##0.00;[Red]($#,##0.00) $1 599,99 ($1 599,99)
> Scientific 0.00E+00 1,98E+08 -1,98E+08
> Percentage (0.025) 3% 2,50%
> Fraction (2.5) 2 1/2
> Time Format: h:mm AM/PM 6:15 fm 6:15 em
> Time Format: h:mm  6:15 18:15
> Date Format: m/d/yy 10/3/09
> Date Format: d-mmm-yy 17-maj-07
> Date/Time Format  1/19/08 4:35
> Custom Number: 19,99 dollars and cents
> Custom Date: 39219.18056369212
> --- 
> Unexpected formats for:
> Date Format: m/d/yy 2009-10-03
> Date/Time Format 2008-01-19 04:35
> Custom Date: At 4:20 AM on torsdag maj 17, 2007

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message