tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: [VOTE] Release Apache Tika 1.15 Candidate #1
Date Tue, 23 May 2017 12:01:59 GMT
I _think_ it is included.  See below for the two options for parsing testZipEncrypted.zip.

Are you not seeing this behavior?  Were you expecting different behavior?  


1) RecursiveParserWrapper

        List<Metadata> metadataList = getRecursiveMetadata("testZipEncrypted.zip");
        debug(metadataList);

yields:

0: X-Parsed-By : org.apache.tika.parser.DefaultParser
0: X-Parsed-By : org.apache.tika.parser.pkg.PackageParser
0: X-TIKA:EXCEPTION:embedded_stream_exception : org.apache.tika.exception.EncryptedDocumentException:
stream (encrypted.txt) is encrypted
	at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:306)
	at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:230)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
	at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
	at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:221)
	at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:213)
	at org.apache.tika.parser.pkg.ZipParserTest.testZipEncrypted(ZipParserTest.java:213)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

0: X-TIKA:parse_time_millis : 34
0: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.PackageParser" />
<meta name="Content-Type" content="application/zip" />
<title></title>
</head>
<body><div class="embedded" id="unencrypted.txt" />
<div class="package-entry"><h1>unencrypted.txt</h1>
</div>
<p>encrypted.txt</p>
</body></html>
0: Content-Type : application/zip
1: date : 2017-03-21T13:07:48Z
1: X-Parsed-By : org.apache.tika.parser.DefaultParser
1: X-Parsed-By : org.apache.tika.parser.txt.TXTParser
1: resourceName : unencrypted.txt
1: dcterms:modified : 2017-03-21T13:07:48Z
1: Last-Modified : 2017-03-21T13:07:48Z
1: Last-Save-Date : 2017-03-21T13:07:48Z
1: embeddedRelationshipId : unencrypted.txt
1: meta:save-date : 2017-03-21T13:07:48Z
1: Content-Encoding : windows-1252
1: X-TIKA:parse_time_millis : 3
1: modified : 2017-03-21T13:07:48Z
1: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="date" content="2017-03-21T13:07:48Z" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.txt.TXTParser" />
<meta name="resourceName" content="unencrypted.txt" />
<meta name="dcterms:modified" content="2017-03-21T13:07:48Z" />
<meta name="Last-Modified" content="2017-03-21T13:07:48Z" />
<meta name="Last-Save-Date" content="2017-03-21T13:07:48Z" />
<meta name="embeddedRelationshipId" content="unencrypted.txt" />
<meta name="meta:save-date" content="2017-03-21T13:07:48Z" />
<meta name="Content-Encoding" content="windows-1252" />
<meta name="modified" content="2017-03-21T13:07:48Z" />
<meta name="Content-Length" content="13" />
<meta name="X-TIKA:embedded_resource_path" content="/unencrypted.txt" />
<meta name="Content-Type" content="text/plain; charset=windows-1252" />
<title></title>
</head>
<body><p>hello world
</p>
</body></html>
1: Content-Length : 13
1: X-TIKA:embedded_resource_path : /unencrypted.txt
1: Content-Type : text/plain; charset=windows-1252

2) Classic XML:

        XMLResult r = getXML("testZipEncrypted.zip");
        for (String n : r.metadata.names()) {
            for (String v : r.metadata.getValues(n)) {
                System.out.println("meta: "+n + " : "+v);
            }
        }
        System.out.println(r.xml);

Yields:
meta: X-Parsed-By : org.apache.tika.parser.DefaultParser
meta: X-Parsed-By : org.apache.tika.parser.pkg.PackageParser
meta: X-TIKA:EXCEPTION:embedded_stream_exception : org.apache.tika.exception.EncryptedDocumentException:
stream (encrypted.txt) is encrypted
	at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:306)
	at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:230)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
	at org.apache.tika.TikaTest.getXML(TikaTest.java:205)
	at org.apache.tika.TikaTest.getXML(TikaTest.java:191)
	at org.apache.tika.parser.pkg.ZipParserTest.testZipEncrypted(ZipParserTest.java:206)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

meta: Content-Type : application/zip
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.PackageParser" />
<meta name="Content-Type" content="application/zip" />
<title></title>
</head>
<body><div class="embedded" id="unencrypted.txt" />
<div class="package-entry"><h1>unencrypted.txt</h1>
<p>hello world
</p>

</div>
<p>encrypted.txt</p>
</body></html>

-----Original Message-----
From: Aeham Abushwashi [mailto:aeham.abushwashi@exonar.com] 
Sent: Tuesday, May 23, 2017 3:47 AM
To: user@tika.apache.org; Tim Allison <tallison@apache.org>
Cc: dev@tika.apache.org
Subject: Re: [VOTE] Release Apache Tika 1.15 Candidate #1

Thanks Tim and apologies if this isn't the right thread to ask this question... any reason
TIKA-2300 is not included despite FixVersions=1.15 on the ticket?

On 22 May 2017 at 20:25, Tim Allison <tallison@apache.org> wrote:

> A candidate for the Tika 1.15 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
> https://github.com/apache/tika/tree/1.15-rc1
>
> The SHA1 checksum of the archive is
> e82697a6804373367fbba98d47426ab74e036eb1.
>
> In addition, a staged maven repository is available here:
> https://repository.apache.org/content/repositories/orgapachetika-1022
>
> Please vote on releasing this package as Apache Tika 1.15.
> The vote is open for the next 72 hours and passes if a majority of at 
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.15 [ ] -1 Do not release 
> this package because...
>
> ***This is my first time as release manager.  Please kick the tires
> thoroughly.***
>
> This is my +1.
>
> Cheers,
>
> Tim
>



--
Aeham Abushwashi
Head of Engineering
Exonar

v: video.exonar.com  |  w: exonar.com <http://www.exonar.com/> | twitter:
@exonar <https://twitter.com/exonar>

GDPR: Why It’s About More Than Regulation: Download the White Paper Here <https://goo.gl/1cSVzH>

Trial <https://www.exonar.com/platform/> the capability on your own organisation's data
to understand what you've got, where it is and who has access to it.


Come and meet us for a chat at Infosecurity Europe <http://www.infosecurityeurope.com/>on
stand S07 in the Cyber Innovation Zone <http://www.infosecurityeurope.com/visit/whats-on/uk-cyber-innovation-zone/>


Exonar Limited, registered in the UK, registration number 06439969 at 14 West Mills, Newbury,
Berkshire, RG14 5HG. DISCLAIMER: This email and any attachments to it may be confidential
or private. If you have received it in error, please notify us and delete it from your system.
Mime
View raw message