nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "TikaPlugin" by JulienNioche
Date Mon, 11 Jan 2010 16:47:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "TikaPlugin" page has been changed by JulienNioche.
http://wiki.apache.org/nutch/TikaPlugin?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  '''js''': ?
  
- '''mp3''': ?
+ '''mp3''': Nutch identifies several fields (Title, Album, Artist) whereas Tika knows only
about Titles, the rest is stored as paragraphs. 
  
  '''msexcel''': comparable (+ Tika able to represent content in structured way as XHTML tables
which can be useful for HTML parser plugins)
  
@@ -19, +19 @@

  
  '''pdf''': comparable
  
- '''rss''': ?
+ '''rss''': Tika identifies only the Mimetype but does nothing about the content
  
  '''rtf''': deactivated in Nutch for licensing reasons | works in Tika
  
  '''swf''' : not yet covered in Tika (see https://issues.apache.org/jira/browse/TIKA-337)
  
- '''text''': ?
+ '''text''': comparable
  
  '''zip''': ?
  

Mime
View raw message