poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Javen O'Neal" <one...@apache.org>
Subject Re: [VOTE] Apache POI 3.15-beta3
Date Sat, 10 Sep 2016 07:19:47 GMT
Bug 60003 is still open and is a regression if POI should be
extracting Prague from the test slideshow.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60003

On Fri, Sep 9, 2016 at 11:44 AM, Allison, Timothy B. <tallison@mitre.org> wrote:
> Thank you, Dominik, for catching these!  3 cheers for mass regression testing!
>
>
> I'm finally back from break and catching up on emails...
>
> -----Original Message-----
> From: Dominik Stadler [mailto:dominik.stadler@gmx.at]
> Sent: Monday, August 15, 2016 6:09 AM
> To: POI Developers List <dev@poi.apache.org>
> Subject: Re: [VOTE] Apache POI 3.15-beta3
>
> Hi,
>
> Running the regression tests for POI 3.15-beta3 against the CommonCrawl corpus is now
finished, initial results are as follows:
>
> * 11966 fail because I did not add commons-collections4, I'll trigger a re-run to get
document-counts correctly show  the number of regressing documents
>
> * 456 times: ArrayIndexOutOfBoundsException in SprmOperation.getOperand()
>
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: *
>         at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:317)
>         at o.a.p.stress.AbstractFileHandler.handleExtractingInternal(AbstractFileHandler.java:85)
>         at o.a.p.stress.AbstractFileHandler.handleExtracting(AbstractFileHandler.java:60)
>         at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
>         at o.a.p.hwpf.sprm.SprmOperation.getOperand(SprmOperation.java:113)
>         at o.a.p.hwpf.sprm.SectionSprmUncompressor.unCompressSEPOperation(SectionSprmUncompressor.java:62)
>         at o.a.p.hwpf.sprm.SectionSprmUncompressor.uncompressSEP(SectionSprmUncompressor.java:44)
>         at o.a.p.hwpf.model.SEPX.getSectionProperties(SEPX.java:61)
>         at o.a.p.hwpf.usermodel.Section.(Section.java:36)
>         at o.a.p.hwpf.usermodel.Range.getSection(Range.java:745)
>         at o.a.p.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:721)
>         at o.a.p.hwpf.extractor.WordExtractor.getText(WordExtractor.java:299)
>         ... 9 more
>
> * 4 times NullPointerException in XSLFTextParagraph.getDefaultFontSize()
>
> java.lang.NullPointerException
>         at o.a.p.xslf.usermodel.XSLFTextParagraph.getDefaultFontSize(XSLFTextParagraph.java:935)
>         at o.a.p.sl.draw.DrawTextParagraph.getAttributedString(DrawTextParagraph.java:567)
>         at o.a.p.sl.draw.DrawTextParagraph.breakText(DrawTextParagraph.java:235)
>         at o.a.p.sl.draw.DrawTextShape.drawParagraphs(DrawTextShape.java:158)
>         at o.a.p.sl.draw.DrawTextShape.getTextHeight(DrawTextShape.java:219)
>         at o.a.p.sl.draw.DrawTextShape.drawContent(DrawTextShape.java:102)
>         at o.a.p.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:93)
>         at o.a.p.sl.draw.DrawSheet.draw(DrawSheet.java:67)
>         at o.a.p.sl.draw.DrawSlide.draw(DrawSlide.java:39)
>         at o.a.p.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:301)
>         at o.a.p.stress.SlideShowHandler.renderSlides(SlideShowHandler.java:120)
>         at o.a.p.stress.SlideShowHandler.handleSlideShow(SlideShowHandler.java:43)
>         at o.a.p.stress.XSLFFileHandler.handleFile(XSLFFileHandler.java:43)
>         at org.dstadler.commoncrawl.FileHandlingRunnable.run(FileHandlingRunnable.java:58)
>
>
>
> The others are probably flaky things where files caused OOM/Timeout before and thus were
not reported with these errors before.
>
>
> See http://people.apache.org/~centic/poi_regression/reports/ and http://people.apache.org/~centic/poi_regression/reportsAll/
for detailed results.
>
>
> Thanks... Dominik.
>
>
> On Mon, Aug 15, 2016 at 4:16 AM, Javen O'Neal <onealj@apache.org> wrote:
>
>> Correction: HSLF. This is a ppt/OLE2 file.
>>
>> On Sun, Aug 14, 2016 at 6:58 PM, Javen O'Neal <onealj@apache.org> wrote:
>> > Tim,
>> >
>> > I have extracted the pptx PowerPoint file containing the Prague
>> > footer. I'm want to write a unit test for POI to find the Prague
>> > string so I can figure why Prague was not included in the Tika
>> > regression test using POI 3.15 beta 3 but was found by POI 3.15 beta
>> > 1.
>> >
>> > Could you point me to the Tika code that generated the potential
>> > regressions zip file in TIKA-2013, or the POI class/function that is
>> > used to extract the text from a document?
>> >
>> > Also, is the pptx file shareable and ASL 2.0 licensed so that it can
>> > be included as part of POI's unit test suite?
>> >
>> > On Fri, Aug 12, 2016 at 6:52 PM, Javen O'Neal <javenoneal@gmail.com>
>> wrote:
>> >> On Aug 12, 2016 11:39, "Allison, Timothy B." <tallison@mitre.org>
>> wrote:
>> >>>...the two potential content regressions may be caused by something
>> >>>at
>> the
>> >>> Tika level.  If anyone has time to take a look, that'd be great.
>> >>
>> >> I can take a look this weekend.
>> >>
>> >> Did you use the same Tika code with different POI versions for
>> >> these
>> tests
>> >> (so that we can attribute the change in behavior to a POI commit,
>> regardless
>> >> of whether the bug is in Tika or POI)?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional
>> commands, e-mail: dev-help@poi.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message