uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Hernandez <nicolas.hernan...@gmail.com>
Subject Re: Guidelines for a mutual contribution
Date Thu, 26 May 2011 12:36:33 GMT

French data models for the Apache UIMA Sandbox HMM Tagger have been
submitted via the jira issue

Documentation on the procedure to build the models from the French
Treebank can be found here (accidentally it is in French...)

The SLA has been sent and we are waiting for receiving the ack.

I have prepared an IP form but have not right to commit it...

Finaly is there an "appropriate volunter" for executing the IP
Clearance processing?

I hope I have nothing forgotten.

Best regards


On Thu, May 19, 2011 at 3:47 PM, Thilo Götz <twgoetz@gmx.de> wrote:
> On 5/19/2011 15:04, Nicolas Hernandez wrote:
>> Hello Everyone
>> Jörn, yes it (training MaxEnt models for OpenNLP from the French
>> Treebank) is actually part of our plan (building a French-Speaking
>> UIMA Community). We wanted also to contribute to the OpenNLP project
>> since no models was available for French processing!
>> About  the right to train models on this data set and then distribute
>> them under Apache License 2: It took time for us to get the right to
>> do it, but I think it was because we were the first to ask for. Now
>> they know about it. I know that the maltparser team
>> (http://maltparser.org/) would be also interested by the grant. You
>> may ask for the French Treebank authors. I can also ask them for
>> letting an explicit mention about the right to do it on their web
>> site.
>> As far as I know, the data training set for the English and German POS
>> models are not freely available, are they ?
> The English model was trained on the Brown corpus, which is free.
> The German model was trained on a non-free corpus.
>> Eventually, Jörn, I m not sure to understand. Do you think the IP
>> clearance process is not adapted for submitting our contribution ?
>> Tommaso, I will blog post the procedure I used to train the models.
>> There is nothing really special. I used some freely available (under
>> AL2) AE components. The HMM learner is already present in the HMM
>> Tagger addon. The few other UIMA components I used are also available
>> on some google forges (uima-common, uima-connectors,
>> uima-type-mapper).
>> Regards
>> /Nicolas
>> On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
>>> On 5/19/11 9:00 AM, Tommaso Teofili wrote:
>>>> If you also plan to donate the models I think the IP clearance is the
>>>> right
>>>> way both for UIMA and for you as a researcher.
>>> In my opinion it is very important that we have the possibility
>>> to retrain the models on the data set, otherwise it will block
>>> code changes and bug fixes.
>>> Therefore I think we need the right to train models on this
>>> data set and then distribute them under AL 2.0.
>>> Jörn

Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

View raw message