uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Götz <twgo...@gmx.de>
Subject Re: Guidelines for a mutual contribution
Date Thu, 19 May 2011 13:47:08 GMT
On 5/19/2011 15:04, Nicolas Hernandez wrote:
> Hello Everyone
> Jörn, yes it (training MaxEnt models for OpenNLP from the French
> Treebank) is actually part of our plan (building a French-Speaking
> UIMA Community). We wanted also to contribute to the OpenNLP project
> since no models was available for French processing!
> About  the right to train models on this data set and then distribute
> them under Apache License 2: It took time for us to get the right to
> do it, but I think it was because we were the first to ask for. Now
> they know about it. I know that the maltparser team
> (http://maltparser.org/) would be also interested by the grant. You
> may ask for the French Treebank authors. I can also ask them for
> letting an explicit mention about the right to do it on their web
> site.
> As far as I know, the data training set for the English and German POS
> models are not freely available, are they ?

The English model was trained on the Brown corpus, which is free.
The German model was trained on a non-free corpus.

> Eventually, Jörn, I m not sure to understand. Do you think the IP
> clearance process is not adapted for submitting our contribution ?
> Tommaso, I will blog post the procedure I used to train the models.
> There is nothing really special. I used some freely available (under
> AL2) AE components. The HMM learner is already present in the HMM
> Tagger addon. The few other UIMA components I used are also available
> on some google forges (uima-common, uima-connectors,
> uima-type-mapper).
> Regards
> /Nicolas
> On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
>> On 5/19/11 9:00 AM, Tommaso Teofili wrote:
>>> If you also plan to donate the models I think the IP clearance is the
>>> right
>>> way both for UIMA and for you as a researcher.
>> In my opinion it is very important that we have the possibility
>> to retrain the models on the data set, otherwise it will block
>> code changes and bug fixes.
>> Therefore I think we need the right to train models on this
>> data set and then distribute them under AL 2.0.
>> Jörn

View raw message