tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Meier (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2602) iCalendar not properly recognized as text/calendar
Date Thu, 08 Mar 2018 09:26:00 GMT
Andreas Meier created TIKA-2602:
-----------------------------------

             Summary: iCalendar not properly recognized as text/calendar
                 Key: TIKA-2602
                 URL: https://issues.apache.org/jira/browse/TIKA-2602
             Project: Tika
          Issue Type: Improvement
            Reporter: Andreas Meier


At the moment the detection of text/calender is covered by the following mime-type-element:

{code:xml}
  <mime-type type="text/calendar">
    <magic priority="50">
      <match value="BEGIN:VCALENDAR" type="string" offset="0">
        <match value="VERSION:2.0" type="string" offset="15:30"/>
      </match>
    </magic>
    <glob pattern="*.ics"/>
    <glob pattern="*.ifb"/>
    <sub-class-of type="text/plain"/>
  </mime-type>
{code}

This recognition will fail, if VERSION:2.0 is not the first property after BEGIN:VCALENDAR.
Since this is not always the case (check: [https://tools.ietf.org/html/rfc5545|https://tools.ietf.org/html/rfc5545]
3.6. Calendar Components) recognition may fail for calendar objects with PRODID or other properties:

 Section "4. iCalendar Object Examples" shows some of these cases:

{code}
       BEGIN:VCALENDAR
       PRODID:-//xyz Corp//NONSGML PDA Calendar Version 1.0//EN
       VERSION:2.0
       BEGIN:VEVENT
       DTSTAMP:19960704T120000Z
       UID:uid1@example.com
       ORGANIZER:mailto:jsmith@example.com
       DTSTART:19960918T143000Z
       DTEND:19960920T220000Z
       STATUS:CONFIRMED
       CATEGORIES:CONFERENCE
       SUMMARY:Networld+Interop Conference
       DESCRIPTION:Networld+Interop Conference
         and Exhibit\nAtlanta World Congress Center\n
        Atlanta\, Georgia
       END:VEVENT
       END:VCALENDAR
{code}

or

{code}
       BEGIN:VCALENDAR
       METHOD:xyz
       VERSION:2.0
       PRODID:-//ABC Corporation//NONSGML My Product//EN
       BEGIN:VEVENT
       DTSTAMP:19970324T120000Z
       SEQUENCE:0
       UID:uid3@example.com
       ORGANIZER:mailto:jdoe@example.com
       ATTENDEE;RSVP=TRUE:mailto:jsmith@example.com
       DTSTART:19970324T123000Z
       DTEND:19970324T210000Z
       CATEGORIES:MEETING,PROJECT
       CLASS:PUBLIC
       SUMMARY:Calendaring Interoperability Planning Meeting
       DESCRIPTION:Discuss how we can test c&s interoperability\n
        using iCalendar and other IETF standards.
       LOCATION:LDB Lobby
       ATTACH;FMTTYPE=application/postscript:ftp://example.com/pub/
        conf/bkgrnd.ps
       END:VEVENT
       END:VCALENDAR
{code}

I suggest to either 
a) widen the offset of the VERSION-match from 15:30 to 15:200 or sth. like that (not so good
approach, since we don't know how Long the PRODID might be) 
or
b) to add sub-matches for CALSCALE, PRODID, METHOD. (This might still not cover everything,
since there are x-prop and iana-prop properties. For now I can only confirm that there are
PRODID or METHOD as first property after BEGIN:VCALENDAR.)


Regards

Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message