tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anubha Balani <bal...@usc.edu>
Subject Re: [jira] [Commented] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides
Date Thu, 11 Oct 2018 20:59:57 GMT
unsubscribe

On Thu, Oct 11, 2018 at 12:49 PM Hudson (JIRA) <jira@apache.org> wrote:

>
>     [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2735-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16646969-23comment-2D16646969&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=AwS5sC4rfobH6ZIR6xweVrD0Tn_-DNyCi7gZaV3dDFM&e=
> ]
>
> Hudson commented on TIKA-2735:
> ------------------------------
>
> FAILURE: Integrated in Jenkins build tika-branch-1x #113 (See [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_job_tika-2Dbranch-2D1x_113_&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=YGO9-ykotYFQaBOLGtOXkZNSmmPzYQNBJMll0DBuMIQ&e=
> ])
> TIKA-2735 -- allow user to avoid extracting "master" sections and notes
> (tallison: [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tika_commit_307a8bd592d6e25419bbad19aac47cc7de201c4d&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=ml87qxUhpeY6vmA_VfyJKvP_PjaXhxwqsPN0jJE5b_U&e=
> ])
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java
>
>
> > notes and footer contents are duplicated in extracting text from power
> point slides
> >
> -----------------------------------------------------------------------------------
> >
> >                 Key: TIKA-2735
> >                 URL:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2735&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=tWgXQDsRm26dLawXmBaknk92SsTf8g-42yM2VHKyiiI&e=
> >             Project: Tika
> >          Issue Type: Bug
> >          Components: handler
> >    Affects Versions: 1.18
> >            Reporter: feng ye
> >            Priority: Major
> >         Attachments: Oneslide.ppt, pptTextResults.txt
> >
> >
> > notes and footer contents are duplicated at the end when extract text
> from ppt slides (like the one in the attachment). Both the input file and
> the text results are attached.
> > Is there a configuration option that can be used to suppress this kind
> of duplication?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


-- 
Warm Regards
Anubha Balani

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message