lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker
Date Tue, 08 Jan 2008 13:56:34 GMT


Grant Ingersoll commented on LUCENE-1117:

I've also noticed with this that the process doesn't die if there is an exception thrown (as
in the one above) b/c I think the thread doesn't stop.  

> Intermittent thread safety issue with EnwikiDocMaker
> ----------------------------------------------------
>                 Key: LUCENE-1117
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>    Affects Versions: 2.2, 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>         Attachments: LUCENE-1117.patch
> Intermittent thread safety issue with EnwikiDocMaker
> When I run the conf/wikipediaOneRound.alg, sometimes it gets started
> OK, other times (about 1/3rd the time) I see this:
>      Exception in thread "Thread-0" java.lang.RuntimeException:
Bad file descriptor
>      	at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$
>      	at
>      Caused by: Bad file descriptor
>      	at Method)
>      	at
>      	at org.apache.xerces.impl.XMLEntityManager$ Source)
>      	at Source)
>      	at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>      	at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
>      	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
>      	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>      	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
>      	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>      	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>      	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>      	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>      	at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$
>      	... 1 more
> The problem is that the thread that pulls the XML docs is started as
> soon as EnwikiDocMaker class is instantiated.  When it's started, it
> uses the fileIS (FileInputStream) to feed the XML Parser.  But,
> openFile is actually called twice on starting the alg, if you use any
> task deriving from ResetInputsTask, which closes the original fileIS
> that the XML parser may be using.
> I changed the thread to instead start on-demand the first time next()
> is called.  I also removed a redundant resetInputs() call (which was
> opening the file more frequently than needed).  Finally, I added logic
> in the thread to detect that the input stream was closed (because
> LineDocMaker.resetInputs() was called, eg, if we are not running the
> doc maker to exhaustion).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message