uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miguel Alvarez" <miguelal...@gmail.com>
Subject RE: Ruta: reloadScript and external engines
Date Tue, 09 Feb 2016 18:45:19 GMT

-----Original Message-----
From: Peter Klügl [mailto:peter.kluegl@averbis.com] 
Sent: February 8, 2016 2:50
To: dev@uima.apache.org
Subject: Re: Ruta: reloadScript and external engines


if there were files attached to this mail, they have been removed. You could attach them to
the jira issue.



Am 05.02.2016 um 17:00 schrieb Miguel Alvarez:
> Find attached the source files...
> I assume the extensions don't support all the types of parameters (unless you tell me
otherwise :-) ), so I had to convert the resource parameters to simple stringerxpressions,
and the feature assignments to string/integer parameter pairs...
> -----Original Message-----
> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
> Sent: February 5, 2016 7:23
> To: dev@uima.apache.org
> Subject: Re: Ruta: reloadScript and external engines
> hi,
> ah ok. I must admit that I was not hure how this issue can be implemented since the connection
to the file is lost. The wordlist resources are in need of refactoring. I am curious how you
solved it.
> Best,
> Peter
> Am 05.02.2016 um 16:03 schrieb Miguel Alvarez:
>> I created two extensions that are the replica of markfast and 
>> marktable but keep the last modified date of the txt or csv files and 
>> reload them if it changes. They work well and this way I don't need 
>> to have the reloadScript set to true.
>> I created a Jira enhancement for this.
>> Cheers
>> Miguel
>> On Feb 2, 2016 06:11, "Miguel Alvarez" <miguelal007@gmail.com> wrote:
>>> Hi Peter
>>> Thanks again for your reply. I am trying to get more familiar with 
>>> the RUTA code so I can make these changes myself, but I am not there 
>>> yet, there are a lot of things I still don't understand :) I will 
>>> create a Jira issue for now.
>>> I actually already tried yesterday the workaround you suggested but 
>>> it doesn't seem to be working either. I have refactored the code so 
>>> the dictionary logic is in separate scripts and set their parameter 
>>> to true, only for those scripts. But the rest of the scripts have it 
>>> set to false including the main script (the one that starts the 
>>> chain of scripts), but it seems that unless I set all the scripts to 
>>> true the dictionaries don't get reloaded.
>>> But now that I am thinking about it I should be able to create 
>>> extensions that allow us to reload the dictionaries only, and that 
>>> way we can leave that parameter to false for all the engines.
>>> I will let you know if that works.
>>> Cheers
>>> Miguel
>>> On Feb 1, 2016 23:54, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
>>>> Hi,
>>>> the analysis engines do not necessarily need to be created anew. If 
>>>> you want, you can create a jira issue for it and I will fix it.
>>>> There is currently no option to reload the dictionaries separately. 
>>>> As a workaround, you could extract/refactor all dictionary logic to 
>>>> a separate ruta analysis engine/script and then only set to 
>>>> parameter to true for this one.
>>>> Best,
>>>> Peter
>>>> Am 01.02.2016 um 20:04 schrieb Miguel Alvarez:
>>>>> Hi Peter,
>>>>> Thanks for your reply.
>>>>> It isn't necessarily causing any problems, I just wanted to 
>>>>> understand
>>>> how
>>>>> it was meant to work. But I can explain to you a bit better my
>>>> situation,
>>>>> and maybe you have a better suggestion.
>>>>> We are currently setting the parameter reloadScript to true in our 
>>>>> RUTA engines so the dictionaries reload without us having to 
>>>>> restart the
>>>> service.
>>>>> But we have some external engines, invoked from RUTA scripts, 
>>>>> which
>>>> create
>>>>> connections to other servers, and until now we have been storing 
>>>>> this connections as class instance variables in our external 
>>>>> engines so they
>>>> can
>>>>> be reused and the engine doesn't need to create a new connection 
>>>>> for
>>>> every
>>>>> document processed. And the initialize method checks whether the 
>>>>> engine instance has already an open connection, so no matter how 
>>>>> many times the initialize method is invoked only one connection is established.
>>>>> But if we invoke this external engine from a RUTA script that has 
>>>>> the reloadScript parameter set to true, a new instance of the 
>>>>> engine is
>>>> created
>>>>> for every document processed, and therefore a new connection to 
>>>>> the
>>>> remote
>>>>> server will be established for each document too, regardless of my
>>>> check for
>>>>> an existing connection in the initialize method (obviously because 
>>>>> it
>>>> is a
>>>>> brand new instance every time).
>>>>> I guess one question I have is: Can we force the reload of 
>>>>> dictionaries directly from the RUTA script or in any other way?
>>>>> Thanks,
>>>>> Miguel
>>>>> -----Original Message-----
>>>>> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
>>>>> Sent: February 1, 2016 10:00
>>>>> To: dev@uima.apache.org
>>>>> Subject: Re: Ruta: reloadScript and external engines
>>>>> Hi,
>>>>> yes, this is correct and intended, but it is missing the original 
>>>>> idea
>>>> and
>>>>> thus it is probably not necessary.
>>>>> The reload of the script needs to be supported for some special 
>>>>> use
>>>> cases
>>>>> like changing the rules during the pipeline processing. The 
>>>>> additional analysis engine are however directly specified in the 
>>>>> configuration parameters and should only be changed with reconfigure().
>>>>> Does this cause problems? I would not change it right now because 
>>>>> I am
>>>> also
>>>>> thinking about removing these parameters at all in a next major 
>>>>> release since they are redundant and could be induced using the script
>>>> The
>>>>> objects could be cached, but the initialize is normally the 
>>>>> expensive
>>>> part.
>>>>> Best,
>>>>> Peter
>>>>> Am 30.01.2016 um 23:33 schrieb Miguel Alvarez:
>>>>>> Hi Peter,
>>>>>> I have another question about external engines :- ) When the 
>>>>>> reloadScript parameter is set to false only one instance of the 
>>>>>> external engine is created and the initialize method is invoked 
>>>>>> only once before processing all the CASes. This is what I was expecting.
>>>>>> But when the reloadScript is set to true the initialize method of

>>>>>> the external engines is invoked once per CAS, as the 
>>>>>> documentation indicates, but it looks like a new instance of the

>>>>>> external engine is
>>>>> created for each CAS too. Is this the expected behaviour?
>>>>>> I was expecting for RUTA to create just once instance of the 
>>>>>> engine, and then on that instance invoke the initialize method 
>>>>>> once per CAS, but I couldn't find any information about this on the
>>>>>> Thanks again.
>>>>>> Cheers,
>>>>>> Miguel

View raw message