tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject HTML mime-types
Date Mon, 07 Dec 2009 02:47:56 GMT
Currently the tika-config.xml file maps three mime-types to the  

         <parser name="parse-html"  

I notice that facebook.com, if you don't specify an Accept: value in  
the request header, returns this for the mime-type:


Wondering if this should be added to the set, and if so then what  
other variants like this are floating around.

Or if we need something like "application/*.xhtml.xml" so that  
wildcards can be used in mimetype patterns.

-- Ken

Ken Krugler
+1 530-210-6378
e l a s t i c   w e b   m i n i n g

View raw message