tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Sherbakov <alexander.sherba...@dsr-company.com>
Subject [Tika Parser 0.9] Errors in parsing of mp3 files
Date Tue, 02 Aug 2011 12:45:27 GMT
Hello,

We've discovered two errors in parsing of mp3 files in tika-0.9:

  * *ArrayIndexOutOfBoundsException: *

    This exception occurs in the constructors of
    org.apache.tika.parser.mp3.ID3v22Handler,
    org.apache.tika.parser.mp3.ID3v23Handler and
    org.apache.tika.parser.mp3.ID3v24Handler.
    It's connected with TCON tag and genres:

        /genre = ID3Tags.GENRES[genreID];/

    But genreID can be out of array bounds.
    The fix is the following:

        /genre = ID3Tags.GENRES[Math.min(genreID, GENRES.length - 1)];/

  * *NegativeArraySizeException *

    This exception occurs in the constructor of
    org.apache.tika.parser.mp3.ID3v2Frame.RawTag.
    It's connected with data size parameter:

        /rawSize = getInt(frameData, offset+nameLength);
        ...
        int size = rawSize * sizeMultiplier;
        size = Math.min(size, frameData.length-copyFrom);
        data = new byte[size];/

    It turns out that the rawSize for some of my mp3 files at this point
    has negative value. Maybe the content of file is incorrect.
    So, one of the possible workarounds is the following:

        /size = Math.min(size, frameData.length-copyFrom);
        *size = Math.max(size, 0);*
        data = new byte[size];/

    Maybe the problem is deeper, but such modification fixes exception.


I've attached the .diff file with my changes.
I hope it will be useful in the next patch.

--
Best regards,
   Alexander

Alexander Shcherbakov | Software Engineer | DSR Company | e-mail: 
alexander.sherbakov@dsr-company.com 
<mailto:%20alexander.sherbakov@dsr-company.com> | skype: 
shcherbakov.alexander

Mime
View raw message