uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miguel Alvarez" <miguelal...@gmail.com>
Subject RE: Ruta: reloadScript and external engines
Date Fri, 05 Feb 2016 16:00:51 GMT
Find attached the source files...

I assume the extensions don't support all the types of parameters (unless you tell me otherwise
:-) ), so I had to convert the resource parameters to simple stringerxpressions, and the feature
assignments to string/integer parameter pairs...

-----Original Message-----
From: Peter Klügl [mailto:peter.kluegl@averbis.com] 
Sent: February 5, 2016 7:23
To: dev@uima.apache.org
Subject: Re: Ruta: reloadScript and external engines

hi,

ah ok. I must admit that I was not hure how this issue can be implemented since the connection
to the file is lost. The wordlist resources are in need of refactoring. I am curious how you
solved it.

Best,

Peter

Am 05.02.2016 um 16:03 schrieb Miguel Alvarez:
> I created two extensions that are the replica of markfast and 
> marktable but keep the last modified date of the txt or csv files and 
> reload them if it changes. They work well and this way I don't need to 
> have the reloadScript set to true.
>
> I created a Jira enhancement for this.
>
> Cheers
> Miguel
> On Feb 2, 2016 06:11, "Miguel Alvarez" <miguelal007@gmail.com> wrote:
>
>> Hi Peter
>>
>> Thanks again for your reply. I am trying to get more familiar with 
>> the RUTA code so I can make these changes myself, but I am not there 
>> yet, there are a lot of things I still don't understand :) I will 
>> create a Jira issue for now.
>>
>> I actually already tried yesterday the workaround you suggested but 
>> it doesn't seem to be working either. I have refactored the code so 
>> the dictionary logic is in separate scripts and set their parameter 
>> to true, only for those scripts. But the rest of the scripts have it 
>> set to false including the main script (the one that starts the chain 
>> of scripts), but it seems that unless I set all the scripts to true 
>> the dictionaries don't get reloaded.
>>
>> But now that I am thinking about it I should be able to create 
>> extensions that allow us to reload the dictionaries only, and that 
>> way we can leave that parameter to false for all the engines.
>>
>> I will let you know if that works.
>>
>> Cheers
>> Miguel
>> On Feb 1, 2016 23:54, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
>>
>>> Hi,
>>>
>>> the analysis engines do not necessarily need to be created anew. If 
>>> you want, you can create a jira issue for it and I will fix it.
>>>
>>> There is currently no option to reload the dictionaries separately. 
>>> As a workaround, you could extract/refactor all dictionary logic to 
>>> a separate ruta analysis engine/script and then only set to 
>>> parameter to true for this one.
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Am 01.02.2016 um 20:04 schrieb Miguel Alvarez:
>>>> Hi Peter,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> It isn't necessarily causing any problems, I just wanted to 
>>>> understand
>>> how
>>>> it was meant to work. But I can explain to you a bit better my
>>> situation,
>>>> and maybe you have a better suggestion.
>>>>
>>>> We are currently setting the parameter reloadScript to true in our 
>>>> RUTA engines so the dictionaries reload without us having to 
>>>> restart the
>>> service.
>>>> But we have some external engines, invoked from RUTA scripts, which
>>> create
>>>> connections to other servers, and until now we have been storing 
>>>> this connections as class instance variables in our external 
>>>> engines so they
>>> can
>>>> be reused and the engine doesn't need to create a new connection 
>>>> for
>>> every
>>>> document processed. And the initialize method checks whether the 
>>>> engine instance has already an open connection, so no matter how 
>>>> many times the initialize method is invoked only one connection is established.
>>>>
>>>> But if we invoke this external engine from a RUTA script that has 
>>>> the reloadScript parameter set to true, a new instance of the 
>>>> engine is
>>> created
>>>> for every document processed, and therefore a new connection to the
>>> remote
>>>> server will be established for each document too, regardless of my
>>> check for
>>>> an existing connection in the initialize method (obviously because 
>>>> it
>>> is a
>>>> brand new instance every time).
>>>>
>>>> I guess one question I have is: Can we force the reload of 
>>>> dictionaries directly from the RUTA script or in any other way?
>>>>
>>>> Thanks,
>>>> Miguel
>>>>
>>>> -----Original Message-----
>>>> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
>>>> Sent: February 1, 2016 10:00
>>>> To: dev@uima.apache.org
>>>> Subject: Re: Ruta: reloadScript and external engines
>>>>
>>>> Hi,
>>>>
>>>> yes, this is correct and intended, but it is missing the original 
>>>> idea
>>> and
>>>> thus it is probably not necessary.
>>>>
>>>> The reload of the script needs to be supported for some special use
>>> cases
>>>> like changing the rules during the pipeline processing. The 
>>>> additional analysis engine are however directly specified in the 
>>>> configuration parameters and should only be changed with reconfigure().
>>>>
>>>> Does this cause problems? I would not change it right now because I 
>>>> am
>>> also
>>>> thinking about removing these parameters at all in a next major 
>>>> release since they are redundant and could be induced using the script files.
>>> The
>>>> objects could be cached, but the initialize is normally the 
>>>> expensive
>>> part.
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>> Am 30.01.2016 um 23:33 schrieb Miguel Alvarez:
>>>>> Hi Peter,
>>>>>
>>>>>
>>>>>
>>>>> I have another question about external engines :- ) When the 
>>>>> reloadScript parameter is set to false only one instance of the 
>>>>> external engine is created and the initialize method is invoked 
>>>>> only once before processing all the CASes. This is what I was expecting.
>>>>> But when the reloadScript is set to true the initialize method of 
>>>>> the external engines is invoked once per CAS, as the documentation 
>>>>> indicates, but it looks like a new instance of the external engine 
>>>>> is
>>>> created for each CAS too. Is this the expected behaviour?
>>>>> I was expecting for RUTA to create just once instance of the 
>>>>> engine, and then on that instance invoke the initialize method 
>>>>> once per CAS, but I couldn't find any information about this on the documentation.
>>>>>
>>>>>
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Miguel
>>>>>
>>>>>
>>>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message