uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Guidelines for a mutual contribution
Date Wed, 15 Jun 2011 15:58:24 GMT
2011/6/15 Tommaso Teofili <tommaso.teofili@gmail.com>

> Nicolas,
> your post on opennlp-user@ made me realize we didn't take care of helping
> you here yet.
> Did you get the ACK for your SGA?
>

I see it's been recorded, so I think we can proceed.
Tommaso


> Regards,
> Tommaso
>
> 2011/5/26 Nicolas Hernandez <nicolas.hernandez@gmail.com>
>
>> Hi
>>
>> French data models for the Apache UIMA Sandbox HMM Tagger have been
>> submitted via the jira issue
>> https://issues.apache.org/jira/browse/UIMA-2146
>>
>> Documentation on the procedure to build the models from the French
>> Treebank can be found here (accidentally it is in French...)
>>
>> http://enicolashernandez.blogspot.com/2011/05/construire-des-modelisations-du-french.html
>>
>> The SLA has been sent and we are waiting for receiving the ack.
>>
>> I have prepared an IP form but have not right to commit it...
>>
>> Finaly is there an "appropriate volunter" for executing the IP
>> Clearance processing?
>>
>> I hope I have nothing forgotten.
>>
>> Best regards
>>
>> /Nicolas
>>
>> On Thu, May 19, 2011 at 3:47 PM, Thilo Götz <twgoetz@gmx.de> wrote:
>> > On 5/19/2011 15:04, Nicolas Hernandez wrote:
>> >> Hello Everyone
>> >>
>> >> Jörn, yes it (training MaxEnt models for OpenNLP from the French
>> >> Treebank) is actually part of our plan (building a French-Speaking
>> >> UIMA Community). We wanted also to contribute to the OpenNLP project
>> >> since no models was available for French processing!
>> >>
>> >> About  the right to train models on this data set and then distribute
>> >> them under Apache License 2: It took time for us to get the right to
>> >> do it, but I think it was because we were the first to ask for. Now
>> >> they know about it. I know that the maltparser team
>> >> (http://maltparser.org/) would be also interested by the grant. You
>> >> may ask for the French Treebank authors. I can also ask them for
>> >> letting an explicit mention about the right to do it on their web
>> >> site.
>> >>
>> >> As far as I know, the data training set for the English and German POS
>> >> models are not freely available, are they ?
>> >
>> > The English model was trained on the Brown corpus, which is free.
>> > The German model was trained on a non-free corpus.
>> >
>> >>
>> >> Eventually, Jörn, I m not sure to understand. Do you think the IP
>> >> clearance process is not adapted for submitting our contribution ?
>> >>
>> >> Tommaso, I will blog post the procedure I used to train the models.
>> >> There is nothing really special. I used some freely available (under
>> >> AL2) AE components. The HMM learner is already present in the HMM
>> >> Tagger addon. The few other UIMA components I used are also available
>> >> on some google forges (uima-common, uima-connectors,
>> >> uima-type-mapper).
>> >>
>> >> Regards
>> >>
>> >> /Nicolas
>> >>
>> >> On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <kottmann@gmail.com>
>> wrote:
>> >>> On 5/19/11 9:00 AM, Tommaso Teofili wrote:
>> >>>>
>> >>>> If you also plan to donate the models I think the IP clearance is
the
>> >>>> right
>> >>>> way both for UIMA and for you as a researcher.
>> >>>>
>> >>>
>> >>> In my opinion it is very important that we have the possibility
>> >>> to retrain the models on the data set, otherwise it will block
>> >>> code changes and bug fixes.
>> >>>
>> >>> Therefore I think we need the right to train models on this
>> >>> data set and then distribute them under AL 2.0.
>> >>>
>> >>> Jörn
>> >>>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> nicolas.hernandez@univ-nantes.fr
>> #
>> http://enicolashernandez.blogspot.com
>> http://www.univ-nantes.fr/hernandez-n
>> #
>> Laboratoire LINA-TALN CNRS UMR 6241
>> tel. +33 (0)2 51 12 58 55
>> #
>> Université de Nantes - Institut Universitaire de Technologie -
>> Département Informatique
>> tel. +33 (0)2 40 30 60 67
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message