uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Hernandez <nicolas.hernan...@gmail.com>
Subject Re: Guidelines for a mutual contribution
Date Thu, 26 May 2011 12:36:33 GMT
Hi

French data models for the Apache UIMA Sandbox HMM Tagger have been
submitted via the jira issue
https://issues.apache.org/jira/browse/UIMA-2146

Documentation on the procedure to build the models from the French
Treebank can be found here (accidentally it is in French...)
http://enicolashernandez.blogspot.com/2011/05/construire-des-modelisations-du-french.html

The SLA has been sent and we are waiting for receiving the ack.

I have prepared an IP form but have not right to commit it...

Finaly is there an "appropriate volunter" for executing the IP
Clearance processing?

I hope I have nothing forgotten.

Best regards

/Nicolas

On Thu, May 19, 2011 at 3:47 PM, Thilo Götz <twgoetz@gmx.de> wrote:
> On 5/19/2011 15:04, Nicolas Hernandez wrote:
>> Hello Everyone
>>
>> Jörn, yes it (training MaxEnt models for OpenNLP from the French
>> Treebank) is actually part of our plan (building a French-Speaking
>> UIMA Community). We wanted also to contribute to the OpenNLP project
>> since no models was available for French processing!
>>
>> About  the right to train models on this data set and then distribute
>> them under Apache License 2: It took time for us to get the right to
>> do it, but I think it was because we were the first to ask for. Now
>> they know about it. I know that the maltparser team
>> (http://maltparser.org/) would be also interested by the grant. You
>> may ask for the French Treebank authors. I can also ask them for
>> letting an explicit mention about the right to do it on their web
>> site.
>>
>> As far as I know, the data training set for the English and German POS
>> models are not freely available, are they ?
>
> The English model was trained on the Brown corpus, which is free.
> The German model was trained on a non-free corpus.
>
>>
>> Eventually, Jörn, I m not sure to understand. Do you think the IP
>> clearance process is not adapted for submitting our contribution ?
>>
>> Tommaso, I will blog post the procedure I used to train the models.
>> There is nothing really special. I used some freely available (under
>> AL2) AE components. The HMM learner is already present in the HMM
>> Tagger addon. The few other UIMA components I used are also available
>> on some google forges (uima-common, uima-connectors,
>> uima-type-mapper).
>>
>> Regards
>>
>> /Nicolas
>>
>> On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
>>> On 5/19/11 9:00 AM, Tommaso Teofili wrote:
>>>>
>>>> If you also plan to donate the models I think the IP clearance is the
>>>> right
>>>> way both for UIMA and for you as a researcher.
>>>>
>>>
>>> In my opinion it is very important that we have the possibility
>>> to retrain the models on the data set, otherwise it will block
>>> code changes and bug fixes.
>>>
>>> Therefore I think we need the right to train models on this
>>> data set and then distribute them under AL 2.0.
>>>
>>> Jörn
>>>
>>
>>
>>
>



-- 
nicolas.hernandez@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

Mime
View raw message