tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Paulin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1762) Create Executor Service from TikaConfig
Date Thu, 15 Oct 2015 22:43:05 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959775#comment-14959775
] 

Bob Paulin commented on TIKA-1762:
----------------------------------

Hey Nick,

Just so we're clear on what's being proposed are you thinking something like this with respect
to the ExternalParser:

{code}
public class ExternalParser extends AbstractParser {
    
    private final ExecutorService executorService;
    
    public ExternalParser() {
        this.executorService = Executors.newSingleThreadExecutor();
    }
    
    public ExternalParser(TikaConfig config)
    {
        this.executorService = config.getExecutorService();
    }   
    
    public ExternalParser(ExecutorService executorService)
    {
        this.executorService = executorService
    }
...
{code}

This does at least provide a default.  For example the following:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Thread t = new Thread() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        };
        t.start();
        try{
     	   t.join();
        }
        catch(InterruptedException ignore){}        
    }
{code}

Could be replaced with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = executorService.submit(new Runnable() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
     	   result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}

The approach I was originally thinking of would not require an executor service to be set
as a default but would replace replace the Thread with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = ConcurrentUtils.execute(new Runnable(){
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
     	   result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}
The code above is not as elegant but doesn't require a default thread pool at instantiation.
 The ConcurrentUtil method could also be modified to take the ExecutorService directly so
it could come from the Parser's field instead of only the ParseContext.  Then a null ExecutorService
could fall back to just creating threads.  Any opinions on which is worse: Spinning up a number
of threads that are infrequently used in a Pool or creating threads on the fly the way we
do now? 

> Create Executor Service from TikaConfig
> ---------------------------------------
>
>                 Key: TIKA-1762
>                 URL: https://issues.apache.org/jira/browse/TIKA-1762
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Bob Paulin
>             Fix For: 1.11
>
>
> Create a configurable executor service that is configurable from the TikaConfig.
>  Konstantin Gribov added a comment - 23/Sep/15 09:55
> Bob Paulin, I have two ideas on the issue:
>     by default use common thread pool, configured via and contained in TikaConfig as
Tyler Palsulich suggested,
>     you can pass thread pool for parser invocation via ParserContext with fallback to
default if now thread pool/executor service in context.
> Also o.a.tika.Tika#parse(InputStream, Metadate) produces o.a.tika.parser.ParsingReader
and anonymous Executor with unbounded daemon thread creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message