ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: licensing question
Date Thu, 04 Oct 2012 17:29:59 GMT

it depends on the texts you train it on, usually its a gray area. There 
are corpora
which are very restrictive in this regard and only allow usage for research,
but that conflicts with the Apache License.

As far as I know do copyright laws on the source text not really apply here,
because the models just contain statistics or bigrams but no original text.

Anyway if you train on your own text and then release the model under AL 2.0
its safe to include it and distribute it.

At OpenNLP we decided to not distribute any models which are trained on 
corpora at Apache without discussing it on the legal list first. But we 
never spoke to them,
and I personally like the idea much more to produce open training data 
which is
Apache friendly (e.g. based on wikinews or wikipeda).


On 10/04/2012 06:39 PM, Chen, Pei wrote:
> Hi Jorn,
> If we trained a model and included it as a resource within the ASF repo, just wanted
to confirm if that's acceptable in ASF even though it's in a binary format?
> Were there any issues for openNLP with including trained models?
> Thanks,
> Pei
>> -----Original Message-----
>> From: Jörn Kottmann [mailto:kottmann@gmail.com]
>> Sent: Wednesday, August 01, 2012 8:01 AM
>> To: ctakes-dev@incubator.apache.org
>> Subject: Re: licensing question
>> On 08/01/2012 01:01 PM, Miller, Timothy wrote:
>>> There was some chatter last week about resources potentially being
>> downloaded via maven for license compatibility reasons.  Just wondering if
>> that brings about the possibility of using external libraries that are not
>> apache-licensed that would also be auto-downloaded under certain maven
>> build commands.  Specifically I was thinking of the GPL-licensed berkeley
>> parser which I've used to get significantly higher accuracy than the opennlp
>> parser we currently wrap in our constituency parser module.
>> Making scripts or maven build commands which download stuff is fine, but it
>> might turn out to be quit limiting for your users which need the freedom of
>> the AL. That will be a problem if Berkeley is the only option.
>> The HBase people for example have an optional dependency on LZO which is
>> GPL, and people there just need to install and download it themselves.
>> See here:
>> http://hbase.apache.org/book/lzo.compression.html
>> Speaking as an OpenNLP committer now, it would of course be nice to make
>> our parser better.
>> If you want to work on that we will be happy to get some patches.
>> Jörn

View raw message