tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: MagicDetector don't work for all RFC882 message Types.
Date Thu, 11 Jul 2013 14:53:26 GMT
I think I may be uniquely qualified to answer this from an Idiot's guide/newish to Tika perspective.
:)  Apologies if I'm missing out on more obvious answers!

SVN info:
http://tika.apache.org/source-repository.html 

Generally how to contribute (Lucene has a good description):
http://wiki.apache.org/lucene-java/HowToContribute 

POI does too:
http://poi.apache.org/guidelines.html 

If you're adding binary files, I found POI's patch task to be very useful.  Grab "patch.xml"
from POI's svn and run:
ant -f patch.xml 

-----Original Message-----
From: Kai-Uwe Schmidt [mailto:kus@bel-it.de] 
Sent: Thursday, July 11, 2013 10:45 AM
To: dev@tika.apache.org
Subject: AW: MagicDetector don't work for all RFC882 message Types.

Sorry patch was meant :-/

-----Urspr√ľngliche Nachricht-----
Von: Kai-Uwe Schmidt [mailto:kus@bel-it.de] 
Gesendet: Donnerstag, 11. Juli 2013 16:42
An: dev@tika.apache.org
Betreff: AW: MagicDetector don't work for all RFC882 message Types.

Where can I read how to provide a path? 

-----Urspr√ľngliche Nachricht-----
Von: Nick Burch [mailto:apache@gagravarr.org]
Gesendet: Donnerstag, 11. Juli 2013 12:48
An: dev@tika.apache.org
Betreff: Re: MagicDetector don't work for all RFC882 message Types.

On Thu, 11 Jul 2013, Kai-Uwe Schmidt wrote:
> I am trying to use Tika to extract metadata from eml's created via 
> Novell Groupwise. By this I ran into a problem with the dedection of 
> "message/rfc822". The MagicDetector (working with the default
> tika-mimetypes.xml) compares the "match" values binary. RFC822 
> describes the header attributes are case independent (see 
> http://www.ietf.org/rfc/rfc0822.txt 3.4.7). So MIME-Version is the 
> same than Mime-Version

Best bet is to open a bug in jira, and upload a (small!) sample file that shows the problem.
We'll need to tweak the mime rules to include that case combination too. (IIRC, the mime magic
rules don't support case insensitive matching)

Nick

Mime
View raw message