uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Guidelines for a mutual contribution
Date Wed, 18 May 2011 15:10:39 GMT
Sounds very interesting, I am actually interested in training
the OpenNLP POS Tagger with this data. I guess we can also use it
to make a Tokenizer and Sentence Detector model.

Would it be possible that the owner of that data grants the right
to distribute models trained on it to the ASF itself?


On 5/18/11 5:04 PM, Nicolas Hernandez wrote:
> Dear All,
> I come back one year later...
> To remind you, we used a French Treebank corpus
> (http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php) to train
> models for processing French with the HMM tagger addon.
> I first contacted you for some advices since we did not own the
> resource we used and we were not sure to be allowed to distribute our
> models under Apache license. We were discussing about with the
> resource owner and we though that an alternative way to distribute the
> models we trained could be to jointly submit the models.
> Eventually, we got the grant from the owner to distribute the models
> we built up under the Apache License v2.
> In short, we built up French models for part of speech (pos),
> morphological (mph) and function grammatical (fct) tagging, as well as
> lemmatization (lemma). We use the Hmm tagger to perform the various
> tagging. A recent patch has been submitted to turn the Hmm tagger into
> a less type system dependant tagger.
> See https://issues.apache.org/jira/browse/UIMA-2110
> Before submitting the models to the project, I have some new
> questions. As a researcher it is important for us that our work be
> cited by other researchers. In addition, the models are only a few
> files but they represent a substantial contribution for the French
> Natural Language Processing community.
> So I was wondering whether you still advise me to perform the IP
> clearance procedure or just to add a specific mention in the NOTICE
> file.
> In the first case, could you find me an "appropriate volunter" for
> executing the IP Clearance processing?
> Another "substantial" question... our model files takes about 5 Mo
> each (pos, mph and fct) except the lemma model file which takes 24 Mo.
> Alternatively we built up a merged model for pos, mph and fct which
> takes 6.9 Mo. Do you thing it may cause a problem if we submit all of
> them?
> Best regards
> /Nicolas
> ---------- Forwarded message ----------
> From: Nicolas Hernandez<nicolas.hernandez@gmail.com>
> Date: Thu, Nov 4, 2010 at 11:28 AM
> Subject: Re: Guidelines for a mutual contribution
> To: dev@uima.apache.org
> Thilo, we would like to submit a language model which was trained on a
> French Treebank corpus for the tagger addon. We do not own the
> treebank corpus we used. We are in discussion with her owner to know
> if we still respect the treebank License by distributing a model built
> on it under the Apache License.
> We though that an alternative way to distribute the model we trained
> could be to jointly submit the model with the owner of the treebank.
> Marshal, I will consult all the links you mention and come back if necessary
> Thanks
> On Thu, Nov 4, 2010 at 11:06 AM, Marshall Schor<msa@schor.com>  wrote:
>> On 11/4/2010 5:06 AM, Nicolas Hernandez wrote:
>>> Hi
>>> Can someone indicate me where to find some guidelines to commit a
>>> mutual contribution? In other words, how to proceed when there is two
>>> developers or corporations involved in a work they would like to
>>> commit ?
>>> I do not find any information on this subject on
>>> http://www.apache.org/licenses/ neither on
>>> http://uima.apache.org/contribution-policy.html
>>> Do we have to submit each of us an "Individual Contributor License
>>> Agreement" to the ASF
>> Each person has to have an "Individual Contributor License Agreement" on file
>> with the ASF (and, if appropriate, a Corporate Contribution License Agreement
>> (see http://www.apache.org/licenses/ and search for Corporate CLA).
>> When you post the contribution, attach it to a Jira and state in the Jira itself
>> what you are doing, including granting the ASF a license under the Apache
>> Software License version 2.0).
>> If the contribution represents "substantial" work developed outside of the ASF's
>> normal process, it will need to go through the IP clearance process, as Tommaso
>> described.
>>>   and specify clearly in the NOTICE file of our
>>> contribution the complete attribution ?
>> Here's info to what goes in the Notice file:
>> http://www.apache.org/legal/src-headers.html#notice
>> and here's a link which says that the ASF prefers if the contributors do not put
>> individual copyright statements into the file:
>> http://www.apache.org/dev/apply-license.html#contributor-copyright - linking to
>> this in particular about moving existing copyright from source into the Notice file:
>> http://www.apache.org/legal/src-headers.html#header-existingcopyright
>> Does this answer your question?
>> -Marshall Schor
>>> Thanks in advance
>>> /Nicolas
> --
> Nicolas.Hernandez@univ-nantes.fr
> --
> http://enicolashernandez.blogspot.com
> http://www.univ-nantes.fr/hernandez-n
> --
> # Laboratoire LINA-TALN CNRS UMR 6241
> tel. +33 (0)2 51 12 58 55
> # Université de Nantes - Institut Universitaire de Technologie -
> Département Informatique
> tel. +33 (0)2 40 30 60 67

View raw message