uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [VOTE] Release Apache UIMA Ruta 2.4.0 RC3
Date Mon, 08 Feb 2016 20:56:28 GMT
So, first I'd like to summarize, in case I don't fully understand the issue.

Ruta contains some examples; the example data include 90K file FirstNames.txt,
in example-projects/GermanNovels/reosources.

>From what I can see, there are no actual German Novels included in the

>From the discussion, it seems the word lists were not originally part of the
contribution; but a comment in UIMA-3926 Peter asks if the word list could be
contributed, but not the novels, and Stefan then contributed them.

I am not a lawyer, so this is not a legal opinion, but I did a quick internet
search and believe that compiling a list of words used in a novel does not
infringe the copyright in that novel, because this data is entirely independent
of the expressive value of any of the underlying sources that might have been
used to compile the list; and the list has lost any similarity to the underlying
sources in terms of things like plot, theme, etc.

So I think the risk is low.  We could probably reduce the risk by asking Stephan
where these lists came from, and if he is aware of any IP issues with them.

To the extent that we collect information and form opinions on issues like this,
I recommend adding a file to the SVN, not necessarily included in the build,
called something like license-notice-research.txt, just to record these things
in one place, so we can find it quickly if a question comes up later and we want
to remember what and why we did something.


On 2/8/2016 5:21 AM, Richard Eckart de Castilho wrote:
> On 08.02.2016, at 11:11, Peter Klügl <peter.kluegl@averbis.com> wrote:
>> Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho:
>>> On 08.02.2016, at 10:11, Peter Klügl <peter.kluegl@averbis.com> wrote:
>>>> Hi,
>>>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho:
>>>>> Checks:
>>>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new dependencies
- OK
>>>>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no
source info/license for this file is given anywhere: doesn't seem OK
>>>>> - stopping checks at this point for the moment
>>>> What kind of source info/license would you expect? The file together
>>>> with the other files was contributed as part of UIMA-3926 with an ICLA
>>>> present. I do not remember if I knew the source of the file by then, but
>>>> I remember that I had some conversations with the contributor that the
>>>> files need to be OK for a contribution. That's the reason why the
>>>> test/dev data was not contributed since it had some CC license that was
>>>> problematic.
>>> The other dev/test data doesn't seem problematic at all, but the 90k names
>>> file seems non-trivial. If it were CC, the license would need to be mentioned
>>> in a LICENSE.txt file. My suggestion would be to simply strip the file down
>>> to the names needed for the example.
>> If I have to guess I'd say that the names have been crawled and that
>> there is no original source file with a specific license.
>> The novels had the CC license last time I checked. I do not remember
>> all, but when I looked it up in Apache's third party pages, it indicated
>> that it was not possible to include them. However, I could have been wrong.
>> Hmm... it depends what is needed for the example. The initial example
>> were 10-20 novels. I could strip it down to the firstnames of one novel
>> I remember to be part of the dev set, but is that really necessary?
> Let's see what Marshall thinks about it.
> -- Richard

View raw message