tika-dev mailing list archives: September 2011

Site index · List index
Message list« Previous · 1 · 2 · 3 · 4 · Next »Thread · Author · Date
Jukka Zitting (JIRA) [jira] [Created] (TIKA-710) Make the Tika facade implement the Parser and Detector interfaces Fri, 09 Sep, 08:45
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-710) Expose the Parser and Detector instances within the Tika facade Fri, 09 Sep, 08:49
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed Fri, 09 Sep, 09:03
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-710) Expose the Parser and Detector instances within the Tika facade Fri, 09 Sep, 09:19
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed Fri, 09 Sep, 09:27
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-594) Upgrade Tika to pdfbox 1.6.0 Fri, 16 Sep, 10:36
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-605) Tika GDAL parser Sat, 17 Sep, 08:48
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after </b> tag Sat, 17 Sep, 09:25
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-703) Drop deprecated methods/classes/interfaces Sat, 17 Sep, 09:31
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-552) Further improvements to Word .doc and .docx parsing Sat, 17 Sep, 09:33
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-691) java.lang.ArrayIndexOutOfBoundsException by MS Word CDF V2 Document Sat, 17 Sep, 09:35
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-705) Valid OOXML PPT file hits InvalidFormatException thrown in POI Sat, 17 Sep, 10:05
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-711) Word parser doesn't extract optional hyphen correctly Sat, 17 Sep, 10:05
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-712) Master slide text isn't extracted Sat, 17 Sep, 10:07
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-714) Word art isn't extracted for various doc types Sat, 17 Sep, 10:07
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-598) Update HDF parser and NetCDF parser to emit minimal XHTML Sat, 17 Sep, 10:31
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-603) Tika 0.9 compiles fine but failed a unit test Sat, 17 Sep, 10:37
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-676) Boilerpipe fails Sat, 17 Sep, 10:39
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-688) Enhance content-type detector to recognize almost plain text Sat, 17 Sep, 11:53
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-719) Concurrent usage of HtmlParser causes infinite loop in HashMap Mon, 19 Sep, 17:49
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-725) Empty title element makes Tika-generated HTML documents not open in Chromium Tue, 20 Sep, 09:41
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-719) Concurrent usage of HtmlParser causes infinite loop in HashMap Tue, 20 Sep, 21:08
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-716) Upgrade apache-Mime4J to Version 0.7 Tue, 20 Sep, 23:04
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-640) RFC822Parser should configure Mime4j not to fail reading mails containing more than 1000 chars in one headers text (even if folded) Tue, 20 Sep, 23:04
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files Tue, 20 Sep, 23:06
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-709) Tika network server does not print anything in response to, for example, Word documents Thu, 22 Sep, 05:53
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-727) Improve the outputed XHTML by HSLFExtractor Thu, 22 Sep, 13:09
Jukka Zitting (JIRA) [jira] [Issue Comment Edited] (TIKA-727) Improve the outputed XHTML by HSLFExtractor Thu, 22 Sep, 13:11
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes Thu, 22 Sep, 18:09
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-552) Further improvements to Word .doc and .docx parsing Thu, 22 Sep, 18:13
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-241) Rar archive support Fri, 23 Sep, 08:42
Jukka Zitting (JIRA) [jira] [Commented] (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes Fri, 23 Sep, 22:09
Jukka Zitting (JIRA) [jira] [Updated] (TIKA-648) Parsing HTML anchors with embedded div faulty Fri, 23 Sep, 22:11
Jukka Zitting (JIRA) [jira] [Created] (TIKA-732) Upgrade to Commons Codec 1.5 Mon, 26 Sep, 15:53
Jukka Zitting (JIRA) [jira] [Resolved] (TIKA-732) Upgrade to Commons Codec 1.5 Mon, 26 Sep, 15:57
Julien Nioche Re: index video and image format with nutch 1.3? Sat, 10 Sep, 08:08
Julien Nioche Re: [VOTE] Add Any23 to the Apache Incubator Tue, 27 Sep, 09:00
Ken Krugler Request for patch review - TIKA-431 Fri, 16 Sep, 18:23
Ken Krugler Support for Open Graph meta tags Fri, 23 Sep, 00:23
Ken Krugler Re: Support for Open Graph meta tags Fri, 23 Sep, 13:06
Ken Krugler Re: Support for Open Graph meta tags Fri, 23 Sep, 14:19
Ken Krugler (JIRA) [jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly. Mon, 12 Sep, 22:43
Ken Krugler (JIRA) [jira] [Commented] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events Thu, 15 Sep, 17:44
Ken Krugler (JIRA) [jira] [Updated] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly. Fri, 16 Sep, 18:15
Ken Krugler (JIRA) [jira] [Commented] (TIKA-539) Encoding detection is too biased by encoding in meta tag Fri, 16 Sep, 18:17
Ken Krugler (JIRA) [jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly. Fri, 16 Sep, 23:03
Ken Krugler (JIRA) [jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly. Sat, 17 Sep, 19:19
Ken Krugler (JIRA) [jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly. Sat, 17 Sep, 22:02
Ken Krugler (JIRA) [jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly. Sat, 17 Sep, 22:04
Ken Krugler (JIRA) [jira] [Assigned] (TIKA-719) Concurrent usage of HtmlParser causes infinite loop in HashMap Mon, 19 Sep, 15:35
Ken Krugler (JIRA) [jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata Fri, 23 Sep, 14:24
Ken Krugler (JIRA) [jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata Fri, 23 Sep, 14:24
Ken Krugler (JIRA) [jira] [Created] (TIKA-728) Return RDFa meta tags via Metadata Fri, 23 Sep, 14:24
Ken Krugler (JIRA) [jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata Fri, 23 Sep, 14:26
Ken Krugler (JIRA) [jira] [Commented] (TIKA-728) Return RDFa meta tags via Metadata Fri, 23 Sep, 14:26
Kevin Clark Re: 1.0 RC in next 2 weeks Thu, 15 Sep, 22:32
Kevin Clark Re: [RESULT] [VOTE] Apache Tika 0.10 release rc #1 Fri, 30 Sep, 17:25
Malik Hemani (JIRA) [jira] [Commented] (TIKA-100) Structured PDF parsing Sun, 04 Sep, 12:20
Mark Kerzner Re: [jira] [Commented] (TIKA-207) MS word doc containing tracked changes produces incorrect text Thu, 01 Sep, 15:27
Mark Kerzner Re: [jira] [Resolved] (TIKA-701) Fix problems with TemporaryFiles Thu, 01 Sep, 15:34
Mattmann, Chris A (388J) Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/main/java/org/apache/tika/parser/external/ tika-pa Thu, 01 Sep, 16:26
Mattmann, Chris A (388J) Re: [PROPOSAL] Any23 to join the incubator Sun, 04 Sep, 16:38
Mattmann, Chris A (388J) Re: 1.0 RC in next 2 weeks Thu, 15 Sep, 22:06
Mattmann, Chris A (388J) Re: 1.0 RC in next 2 weeks Fri, 16 Sep, 03:09
Mattmann, Chris A (388J) Re: 1.0 RC in next 2 weeks Fri, 16 Sep, 14:44
Mattmann, Chris A (388J) Re: Release date of tika 1.0 or 0.10 Thu, 22 Sep, 03:02
Mattmann, Chris A (388J) Re: Support for Open Graph meta tags Fri, 23 Sep, 00:51
Mattmann, Chris A (388J) Re: Support for Open Graph meta tags Fri, 23 Sep, 16:20
Mattmann, Chris A (388J) Re: Release date of tika 1.0 or 0.10 Fri, 23 Sep, 22:03
Mattmann, Chris A (388J) [NOTICE} 0.10 RC likely this evening PDT Sun, 25 Sep, 19:05
Mattmann, Chris A (388J) Re: [PROPOSAL] Any23 to join the incubator Mon, 26 Sep, 06:33
Mattmann, Chris A (388J) [VOTE] Apache Tika 0.10 release rc #1 Mon, 26 Sep, 06:50
Mattmann, Chris A (388J) Re: [VOTE] Apache Tika 0.10 release rc #1 Mon, 26 Sep, 13:37
Mattmann, Chris A (388J) Re: [VOTE] Apache Tika 0.10 release rc #1 Mon, 26 Sep, 15:44
Mattmann, Chris A (388J) Re: [VOTE] Apache Tika 0.10 release rc #1 Mon, 26 Sep, 16:14
Mattmann, Chris A (388J) [VOTE] Add Any23 to the Apache Incubator Tue, 27 Sep, 05:18
Mattmann, Chris A (388J) [RESULT] [VOTE] Apache Tika 0.10 release rc #1 Fri, 30 Sep, 17:00
Mattmann, Chris A (388J) [ANNOUNCE] Apache Tika 0.10 released Fri, 30 Sep, 18:18
Mattmann, Chris A (388J) Re: [ANNOUNCE] Apache Tika 0.10 released Fri, 30 Sep, 18:42
Maxim Valyanskiy Re: svn commit: r1165230 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ test/resources/test-documents/ Mon, 05 Sep, 16:14
Maxim Valyanskiy Re: [VOTE] Apache Tika 0.10 release rc #1 Mon, 26 Sep, 11:02
Maxim Valyanskiy (JIRA) [jira] [Commented] (TIKA-708) NPE Parsing MS Word 12.0.0 Mon, 12 Sep, 15:02
Maxim Valyanskiy (JIRA) [jira] [Updated] (TIKA-708) NPE Parsing MS Word 12.0.0 Tue, 13 Sep, 07:52
Maxim Valyanskiy (JIRA) [jira] [Created] (TIKA-726) Provide a way to distinguish generic parse error and parse error due to unknown/wrong decryption key Wed, 21 Sep, 09:26
Maxim Valyanskiy (JIRA) [jira] [Resolved] (TIKA-726) Provide a way to distinguish generic parse error and parse error due to unknown/wrong decryption key Wed, 21 Sep, 09:42
Maxim Valyanskiy (JIRA) [jira] [Assigned] (TIKA-731) NPE in WordExtractor.handleParagraph() Mon, 26 Sep, 10:36
Maxim Valyanskiy (JIRA) [jira] [Resolved] (TIKA-731) NPE in WordExtractor.handleParagraph() Mon, 26 Sep, 10:40
Michael McCandless Re: svn commit: r1163336 - in /tika/trunk/tika-parsers/src/test: java/org/apache/tika/parser/rtf/ resources/test-documents/ Thu, 01 Sep, 09:42
Michael McCandless Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/main/java/org/apache/tika/parser/external/ tika-pa Thu, 01 Sep, 10:23
Michael McCandless Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/main/java/org/apache/tika/parser/external/ tika-pa Thu, 01 Sep, 15:08
Michael McCandless Re: Resource management patterns (Was: Tika leaves files open) Thu, 01 Sep, 15:13
Michael McCandless Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/main/java/org/apache/tika/parser/external/ tika-pa Fri, 02 Sep, 09:12
Michael McCandless Re: 1.0 RC in next 2 weeks Fri, 16 Sep, 15:14
Michael McCandless Re: Release date of tika 1.0 or 0.10 Fri, 23 Sep, 19:26
Michael McCandless Re: Release date of tika 1.0 or 0.10 Sat, 24 Sep, 10:30
Michael McCandless Re: Release date of tika 1.0 or 0.10 Sat, 24 Sep, 10:31
Michael McCandless Re: Release date of tika 1.0 or 0.10 Sat, 24 Sep, 12:29
Michael McCandless Re: [VOTE] Apache Tika 0.10 release rc #1 Mon, 26 Sep, 16:03
Michael McCandless Re: apache-tika-app? (Was: [VOTE] Apache Tika 0.10 release rc #1) Mon, 26 Sep, 17:11
Michael McCandless (Assigned) (JIRA) [jira] [Assigned] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException Wed, 28 Sep, 10:33
Message list« Previous · 1 · 2 · 3 · 4 · Next »Thread · Author · Date
Box list
Jun 2019142
May 2019328
Apr 2019194
Mar 201956
Feb 201985
Jan 2019222
Dec 2018158
Nov 2018339
Oct 2018298
Sep 2018267
Aug 2018171
Jul 2018235
Jun 2018200
May 2018228
Apr 2018138
Mar 2018368
Feb 2018249
Jan 2018128
Dec 2017176
Nov 2017263
Oct 2017142
Sep 2017236
Aug 2017214
Jul 2017364
Jun 2017310
May 2017493
Apr 2017426
Mar 2017405
Feb 2017235
Jan 2017375
Dec 2016359
Nov 2016351
Oct 2016385
Sep 2016476
Aug 2016242
Jul 2016197
Jun 2016328
May 2016344
Apr 2016620
Mar 2016423
Feb 2016463
Jan 2016296
Dec 2015185
Nov 2015170
Oct 2015320
Sep 2015388
Aug 2015397
Jul 2015323
Jun 2015307
May 2015317
Apr 2015475
Mar 2015891
Feb 2015445
Jan 2015601
Dec 2014253
Nov 2014389
Oct 2014481
Sep 2014364
Aug 2014393
Jul 2014328
Jun 2014671
May 2014298
Apr 2014161
Mar 2014226
Feb 2014293
Jan 2014150
Dec 2013155
Nov 201384
Oct 2013100
Sep 201386
Aug 2013103
Jul 2013146
Jun 2013138
May 2013126
Apr 201374
Mar 201370
Feb 2013174
Jan 2013205
Dec 2012109
Nov 2012124
Oct 2012118
Sep 201261
Aug 2012173
Jul 2012274
Jun 2012102
May 2012174
Apr 2012180
Mar 2012200
Feb 2012125
Jan 2012189
Dec 2011287
Nov 2011259
Oct 2011336
Sep 2011356
Aug 2011197
Jul 2011120
Jun 2011122
May 2011184
Apr 2011137
Mar 2011161
Feb 2011111
Jan 201185
Dec 201099
Nov 2010252
Oct 2010144
Sep 2010168
Aug 2010253
Jul 2010192
Jun 2010154
May 2010132
Apr 2010115
Mar 201090
Feb 201062
Jan 2010134
Dec 2009125
Nov 2009179
Oct 200989
Sep 2009115
Aug 200946
Jul 200977
Jun 200994
May 200981
Apr 200936
Mar 200996
Feb 200974
Jan 200993
Dec 2008112
Nov 2008147
Oct 200854
Sep 2008108
Aug 200826
Jul 200817
Jun 200820
May 200816
Apr 200844
Mar 200873
Feb 200836
Jan 200888
Dec 200785
Nov 2007100
Oct 2007424
Sep 2007265
Aug 200719
Jul 200730
Jun 200751
May 200721
Apr 200712
Mar 200712