uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <pklu...@uni-wuerzburg.de>
Subject Re: TextMarker
Date Mon, 03 Jan 2011 15:32:42 GMT
The wiki is running again...

Peter

Am 03.01.2011 16:00, schrieb Peter Klügl:
> Hi Thilo,
>
> Am 01.01.2011 13:41, schrieb Thilo Goetz:
>> Hi Peter,
>>
>> I downloaded the source trunk and got things mostly to compile
>> and run: I'm running Eclipse 3.5.2, RCP edition, and installed
>> the latest UIMA plugins and DLTK 1.0.2.  I also had to find the
>> Mozilla xpcom plugin.  The only thing not compiling for me are
>> references to com.sun.org.apache.apache.xpath.XPathAPI.  The
>> internet tells me that those could be fixed by using Xalan
>> directly, but I haven't tried.
>>
>
> The XPCom plugin is only necessary for the HTML visualization of the 
> CEV plugin. The XULRunner plugin provides the implementations of the 
> interfaces for the manipulation of the DOM within Eclipse. Both 
> plugins often cause problems, but I haven't found a better solution yet.
>
> About the XML problem: Which plugin has that reference? I've had a 
> similar problem about three year ago, but that should be solved. 
> However, I'm not an expert of the different XML integrations in Java. 
> The only place in my code, if I'm not mistaken, where XML is actively 
> used, is the engine project that is able to load dictionaries in 
> trie-like structures. But that should work just fine without 
> additional libraries. Can you give me more information about that 
> problem?
>
>> My main issue right now is that the TextMarker wiki is down,
>> and that seems to be the only source of documentation (unless
>> I missed something).
>
> I'm sorry about that. My colleagues moved the wiki to a new server 
> that is not as stable as expected. We will fix that ASAP. The wiki is 
> still the only bit of documentation that currently exists.
>
>>
>> I noticed that TextMarker uses a lot of 3rd party libraries.
>> So we'll need to compile an exhaustive list of the the libs
>> that are being used, their licenses and provenance, and in
>> case the license is bad, possible alternatives.
>>
>
> I'm willing to reduce the usage or exchange any 3rd party library if 
> possible.
>
> The most important dependencies are the UIMA-runtime plugin, the 
> Eclipse-plugins (core, ui...), the plugins of the DLTK-Core framework 
> and ANTLR (used for the AST in the IDE and for interpreting the rules 
> in the analysis engines). The optional HTML extension of the CEV 
> plugin uses an html-parser additional to the XPCom dependency.
>
> There are only historical reasons why some plugins were hosted on 
> SourceForge and they are not part of the TextMarker system. I have 
> removed them now:
> de.uniwue.tm.cas.converter
> de.uniwue.tm.old.OfficeConverter
> de.uniwue.tm.textmarker.uutuc
>
>
> Peter
>
>
>> --Thilo
>>
>> On 12/14/2010 15:55, Peter Klügl wrote:
>>> Hello,
>>>
>>> We would like to contribute our TextMarker system to Apache UIMA and
>>> want to ask, if the development team is interested in this 
>>> contribution.
>>> The system is currently hosted on SourceForge
>>> (http://sourceforge.net/projects/textmarker/) and there is some
>>> documentation in the project wiki
>>> (http://tmwiki.informatik.uni-wuerzburg.de/).
>>>
>>> I think it's a good start for that discussion, if I summarize the
>>> current status of the system. TextMarker is an Eclipse-based tool
>>> implemented in pure Java that can among other things be used to
>>> prototype analysis engines or develop complex handcrafted text
>>> processing applications. It consists of four major parts:
>>>
>>> Language:
>>> The rule or rather script language can be compared to regular
>>> expressions over annotation with additional conditions and actions.
>>> There are currently 28 different conditions and 34 actions. They range
>>> from a test on a feature value to a test, if the matched annotation is
>>> contained in another annotation of a given type, respectively from
>>> creating an annotation to applying an external dictionary or analysis
>>> engine. A TextMarker script can import type systems or define new types
>>> or variables. Then, there are also some more complex control structures
>>> for procedure calls, conditioned statements or recursion. The 
>>> TextMarker
>>> language (and inference) is in active usage in some productive
>>> applications here, but it lacks of test cases. However, we are 
>>> currently
>>> writing uimaFIT based component test to improve the quality management.
>>>
>>> Workbench:
>>> The Eclipse-based tool for developing the TextMarker scripts is
>>> currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
>>> editor supports syntax highlighting, syntax checks, context-sensitive
>>> auto-completion, formatting, mark occurrences, open declaration and 
>>> some
>>> other useful stuff commonly known in IDEs. For each script file, a type
>>> system and an executable analysis engine is created. Therefore, it's
>>> quite simple and efficient to create an analysis engine with a few 
>>> lines
>>> of TextMarker rules. The workbench supports testing on annotated xmiCas
>>> while writing new rules and provides some minimal debugging
>>> functionality that explains why and on what text a rule was executed.
>>>
>>> CEV:
>>> This plugin can be used to edit or visualize xmiCAS and is also able to
>>> render HTML. It is heavily used by the testing and explanation 
>>> components.
>>>
>>> TextRuler:
>>> This framework for rule learning is rather a playground and mainly
>>> implemented by students. There are currently more or less working
>>> implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
>>> three other algorithms are being implemented.
>>>
>>>
>>> Overall, the system is working stable for a year now, but lacks in code
>>> quality, documentation and test cases. Basically, we are also 
>>> willing to
>>> change the name of the system, if someone can think of a better one.
>>>
>>> I'm looking forward to your comments.
>>>
>>> Best regards,
>>>
>>> Peter
>>>
>>>


Mime
View raw message