poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 54823] New: Wrong type on Total Time field in org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.CTProperties
Date Wed, 10 Apr 2013 09:27:12 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=54823

            Bug ID: 54823
           Summary: Wrong type on Total Time field in
                    org.openxmlformats.schemas.officeDocument.x2006.extend
                    edProperties.CTProperties
           Product: POI
           Version: 3.8
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: POI Overall
          Assignee: dev@poi.apache.org
          Reporter: gjorgji.josifov.for.apache@gmail.com
    Classification: Unclassified

Hello, devs from Apache POI
I got this error while parsing Microsoft Word document using Apache Tika
parser.

org.apache.tika.exception.TikaException: Error creating OOXML extractor
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:125)
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
    at
xxx.yyyy.services.impl.LuceneServiceImpl.fillDocumentFields(LuceneServiceImpl.java:167)
    at
xxx.yyyy.services.impl.LuceneServiceImpl.createLuceneDocumentForFile(LuceneServiceImpl.java:624)
    at
xxx.yyyy.services.impl.LuceneServiceImpl.indexNewFile(LuceneServiceImpl.java:650)
    at $LuceneService_63044c23b5df.indexNewFile(Unknown Source)
    at $LuceneService_63044c23b5e0.advised$indexNewFile_63044c23b5fa(Unknown
Source)
    at
$LuceneService_63044c23b5e0$Invocation_indexNewFile_63044c23b5f9.proceedToAdvisedMethod(Unknown
Source)
    at
org.apache.tapestry5.internal.plastic.AbstractMethodInvocation.proceed(AbstractMethodInvocation.java:84)
    at xxx.yyyy.services.logging.LoggingAdvice.advise(LoggingAdvice.java:29)
    at
org.apache.tapestry5.internal.plastic.AbstractMethodInvocation.proceed(AbstractMethodInvocation.java:86)
    at $LuceneService_63044c23b5e0.indexNewFile(Unknown Source)
    at $LuceneService_63044c23b59b.indexNewFile(Unknown Source)
    at
xxx.yyyy.services.impl.IndexScheduleServiceImpl.executeDocumentActions(IndexScheduleServiceImpl.java:119)
    at
xxx.yyyy.services.impl.IndexScheduleServiceImpl.access$0(IndexScheduleServiceImpl.java:76)
    at
xxx.yyyy.services.impl.IndexScheduleServiceImpl$1.run(IndexScheduleServiceImpl.java:50)
    at
org.apache.tapestry5.ioc.internal.services.cron.PeriodicExecutorImpl$Job.invoke(PeriodicExecutorImpl.java:178)
    at
org.apache.tapestry5.ioc.internal.services.cron.PeriodicExecutorImpl$Job.invoke(PeriodicExecutorImpl.java:48)
    at
org.apache.tapestry5.ioc.internal.services.ParallelExecutorImpl$1.call(ParallelExecutorImpl.java:58)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.xmlbeans.impl.values.XmlValueOutOfRangeException: Invalid
int value: 4294934530
    at
org.apache.xmlbeans.impl.values.JavaIntHolder.set_text(JavaIntHolder.java:43)
    at
org.apache.xmlbeans.impl.values.XmlObjectBase.update_from_wscanon_text(XmlObjectBase.java:1135)
    at
org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1274)
    at
org.apache.xmlbeans.impl.values.JavaIntHolder.intValue(JavaIntHolder.java:53)
    at
org.apache.xmlbeans.impl.values.XmlObjectBase.getIntValue(XmlObjectBase.java:1500)
    at
org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.impl.CTPropertiesImpl.getTotalTime(Unknown
Source)
    at
org.apache.tika.parser.microsoft.ooxml.MetadataExtractor.extractMetadata(MetadataExtractor.java:123)
    at
org.apache.tika.parser.microsoft.ooxml.MetadataExtractor.extract(MetadataExtractor.java:61)
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:115)
    ... 27 more

So I investigate the problem and it's seems that line 123 in class
org.apache.tika.parser.microsoft.ooxml.MetadataExtractor
    addProperty(metadata, OfficeOpenXMLExtended.TOTAL_TIME,
propsHolder.getTotalTime());

Total Time is long at runtime and this excepts only int.
This bug is not related with Apache Tika, but with this interface
org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.CTProperties
which is part of poi-ooxml-schemas ver. 3.8 and used by Apache Tika.
Interface CTProperties defines return type of the method getTotalTime() as int
but at runtime is the value is long and it should be changed with long.
My workaround copy classes
MetadataExtractor, OOXMLExtractorFactory and override class OOXMLParser (add
method getUnsupportedTypes) and remove parsing of TOTAL_TIME, because I never
use this field.
This workaround can be applied when you use Apache Tika for parsing .docx
documents.
Best Regards, Gjorgji
p.s I hope I was very detail in my explanation

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message