tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
Date Wed, 01 Aug 2012 19:30:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426842#comment-13426842

Jukka Zitting commented on TIKA-885:

What I had in mind was something like a {{Metadata.copyFrom(Metadata)}} method that would
copy all metadata from one instance to another. We'd then have three {{Metadata}} instances,
one for the client, one for the parser and a shared one for passing updates from the parser
to the client. Each {{write()}} in the background parser would do something like:

synchronized (sharedMetadata) {

... and each {{read()}} by the client would do:

synchronized (sharedMetadata) {

It's not terribly elegant, but should avoid the need to make all {{Metadata}} instances thread-safe.

bq. customized versions of PipedReader and PipedWriter classes that work concurrently

I'm not sure I understand. Perhaps you could describe the idea in more detail either on the
dev@ list or in a separate improvement issue.
> Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader
> -------------------------------------------------------------------------------------------
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them to execute
concurrently, because they notify each other only when the pipe is full or empty, and do not
after a char is read or written to the pipe. So i modified ParsingReader to use modified versions
of PipedReader and PipedWriter, similar to gnu versions of them, that work concurrently. However,
sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while it is being
read by the ParsingReader thread, with files containing metadata beyond its initial bytes.
It will not occur with the current implementation, because java PipedReader and PipedWriter
block each other, what is a performance bug that affect ParsingReader, but they could be fixed
in a future java release. I think it would be a defensive approach to turn access to the private
Metadata.metadata Map synchronized, what could avoid a possible future problem using ParsingReader.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message