uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eck...@ukp.informatik.tu-darmstadt.de>
Subject Re: A light raodmap proposal for UIMA addons
Date Mon, 19 Nov 2012 13:55:43 GMT
Am 18.11.2012 um 09:10 schrieb Tommaso Teofili <tommaso.teofili@gmail.com>:

> Another thing I'd like to add to our website / documentation is something
> like "how to do NLP task X within Apache UIMA"; we currently have a list of
> external sources plus the addons page but I think it'd be nice for a user
> to know which known open source UIMA enabled options exist for doing e.g.
> language detection, tokenization, etc either in the addons package or from
> some other sources.
> What do you think?

I'm not sure how much detail can be useful here, because the whole thing has grown so huge
meanwhile. I think he external resources page is quite good, but going down to the component
level may just be a bit too much. To you give you a few numbers just from DKPro Core:

- ~17 tools are wrapped for UIMA in DKPro Core trunk. A good part of these
  provide more than one component! (e.g. Clear NLP, Stanford NLP, OpenNLP, …).
  In addition to those ~17, there are also several original components. 
  Unfortunately, I didn't come up with an easy way to count the actual
  components, but I would guess something like 30+.

- ~17 module with readers and writers for various formats are provided in
  DKPro Core trunk.

- 62 artifacts are returned on a Maven Central search [1] for DKPro Core
  1.4.0. I was admittedly a bit shocked when I noticed this recently. In Eclipse,
  I ususally don't count the stuff. The upcoming DKPro Core version will have even
  more than that.

- 81 different models have been packaged for the various tools in various
  languages and distribute them via Maven [2]. There are a couple more available
  for the TreeTagger module, but due to license reasons we can only provide a
  script for people to package them themselves.

... and this is only DKPro Core alone, not to mention the UIMA Sandbox, Clear TK, cTAKES and
whatnot. Listing them all on a component-level, I think, would make a huge list! 

-- Richard

[1] http://search.maven.org/#search%7Cga%7C1%7Cdkpro
[2] https://docs.google.com/spreadsheet/pub?key=0ApGcdapz0xSYdGh2azY2ODMtZDRNczUySEZJUFpXM2c

Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message