tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Filipe Nassif (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1007) Improve Concurrency of ParsingReader
Date Sun, 14 Oct 2012 20:31:02 GMT
Luis Filipe Nassif created TIKA-1007:

             Summary: Improve Concurrency of ParsingReader
                 Key: TIKA-1007
                 URL: https://issues.apache.org/jira/browse/TIKA-1007
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.2
         Environment: jre 1.7.0_05 x64, Windows 7 Enterprise x64
            Reporter: Luis Filipe Nassif
         Attachments: FastPipedReader.java, FastPipedWriter.java, ModifiedParsingReader.java,
ModifiedParsingReaderTest.java, ParsingReaderTest.java

As discussed in TIKA-885, PipedReader and PipedWriter classes have a bug that do not allow
them to execute concurrently, because they notify each other only when the pipe is full or
empty, and do not after a char is read or written to the pipe. It affects the concurrency
of the reader and writer sides of ParsingReader. Try to execute the attached ParsingReaderTest.java
and you will see that only one processor is used (25% CPU on my quad core machine). So i modified
ParsingReader to use modified versions of PipedReader and PipedWriter, that work concurrently.
Try to execute the attached ModifiedParsingReaderTest.java and you will see that 2 processors
are used (50% on my machine). The attached FastPipedReader.java and FastPipedWriter.java are
only for demonstration purposes, because I took the base code from the net and changed it,
so it could suffer from license restrictions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message