uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Hernandez <nicolas.hernan...@gmail.com>
Subject Re: Guidelines for a mutual contribution
Date Thu, 19 May 2011 13:04:20 GMT
Hello Everyone

Jörn, yes it (training MaxEnt models for OpenNLP from the French
Treebank) is actually part of our plan (building a French-Speaking
UIMA Community). We wanted also to contribute to the OpenNLP project
since no models was available for French processing!

About  the right to train models on this data set and then distribute
them under Apache License 2: It took time for us to get the right to
do it, but I think it was because we were the first to ask for. Now
they know about it. I know that the maltparser team
(http://maltparser.org/) would be also interested by the grant. You
may ask for the French Treebank authors. I can also ask them for
letting an explicit mention about the right to do it on their web
site.

As far as I know, the data training set for the English and German POS
models are not freely available, are they ?

Eventually, Jörn, I m not sure to understand. Do you think the IP
clearance process is not adapted for submitting our contribution ?

Tommaso, I will blog post the procedure I used to train the models.
There is nothing really special. I used some freely available (under
AL2) AE components. The HMM learner is already present in the HMM
Tagger addon. The few other UIMA components I used are also available
on some google forges (uima-common, uima-connectors,
uima-type-mapper).

Regards

/Nicolas

On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
> On 5/19/11 9:00 AM, Tommaso Teofili wrote:
>>
>> If you also plan to donate the models I think the IP clearance is the
>> right
>> way both for UIMA and for you as a researcher.
>>
>
> In my opinion it is very important that we have the possibility
> to retrain the models on the data set, otherwise it will block
> code changes and bug fixes.
>
> Therefore I think we need the right to train models on this
> data set and then distribute them under AL 2.0.
>
> Jörn
>



-- 
nicolas.hernandez@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

Mime
View raw message