uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [VOTE] Release Apache UIMA Ruta 2.4.0 RC3
Date Tue, 09 Feb 2016 15:32:54 GMT
I agree with this analysis; I think this is minimal risk.


On 2/9/2016 4:24 AM, Peter Klügl wrote:
> He crawled it from this site [1] and then he modified the result by
> removing entries or single letters.
> I do not see any license notice. Is this a good or bad sign for us?
> IANAL (and actually do not know much about it) but I would assume that
> it is not problematic. There is no specific source file and the owner
> probably cannot call copyright for single firstnames.
> Best,
> Peter
> [1] http://www.vornamen-liste.de/
> Am 09.02.2016 um 10:17 schrieb Peter Klügl:
>> I additionally sent an email to the last address I know.
>> Am 08.02.2016 um 22:26 schrieb Richard Eckart de Castilho:
>>> The problem I see is that we currently do not know where the file comes from
>>> (provenance). I find it hard to believe that the file was an original creation
>>> from Stefan. I believe that it could take quite some time to compile such a
>>> list of names. More likely is in my opinion, that the file was obtained from
>>> some third-party source. 
>>> If we knew that third-party source, we might easily be able to clear IP.
>>> Since we do not know it, we currently have to resort to speculation about the
>>> lawfulness of compiling specialized unigram lists.
>>> It looks like we can agree this is not a blocker for the present release as
>>> involved risk is apparently very low. Still, we should try to clear this.
>>> I've placed a comment on UIMA-3926 asking Stefan to shed some light on the
>>> provenance of the file. Let's see what comes of it.
>>> Thanks for digging up the issue number Marschall!
>>> Cheers,
>>> -- Richard
>>>> On 08.02.2016, at 21:56, Marshall Schor <msa@schor.com> wrote:
>>>> So, first I'd like to summarize, in case I don't fully understand the issue.
>>>> Ruta contains some examples; the example data include 90K file FirstNames.txt,
>>>> in example-projects/GermanNovels/reosources.
>>>> From what I can see, there are no actual German Novels included in the
>>>> example-project/GermanNovels.
>>>> From the discussion, it seems the word lists were not originally part of
>>>> contribution; but a comment in UIMA-3926 Peter asks if the word list could
>>>> contributed, but not the novels, and Stefan then contributed them.
>>>> I am not a lawyer, so this is not a legal opinion, but I did a quick internet
>>>> search and believe that compiling a list of words used in a novel does not
>>>> infringe the copyright in that novel, because this data is entirely independent
>>>> of the expressive value of any of the underlying sources that might have
>>>> used to compile the list; and the list has lost any similarity to the underlying
>>>> sources in terms of things like plot, theme, etc.
>>>> So I think the risk is low.  We could probably reduce the risk by asking
>>>> where these lists came from, and if he is aware of any IP issues with them.
>>>> To the extent that we collect information and form opinions on issues like
>>>> I recommend adding a file to the SVN, not necessarily included in the build,
>>>> called something like license-notice-research.txt, just to record these things
>>>> in one place, so we can find it quickly if a question comes up later and
we want
>>>> to remember what and why we did something.
>>>> -Marshall
>>>> On 2/8/2016 5:21 AM, Richard Eckart de Castilho wrote:
>>>>> On 08.02.2016, at 11:11, Peter Klügl <peter.kluegl@averbis.com>
>>>>>> Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho:
>>>>>>> On 08.02.2016, at 10:11, Peter Klügl <peter.kluegl@averbis.com>
>>>>>>>> Hi,
>>>>>>>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho:
>>>>>>>>> Checks:
>>>>>>>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no
new dependencies - OK
>>>>>>>>> - the FirstNames.txt file in GermanNovels is quite large
90k, but no source info/license for this file is given anywhere: doesn't seem OK
>>>>>>>>> - stopping checks at this point for the moment
>>>>>>>> What kind of source info/license would you expect? The file
>>>>>>>> with the other files was contributed as part of UIMA-3926
with an ICLA
>>>>>>>> present. I do not remember if I knew the source of the file
by then, but
>>>>>>>> I remember that I had some conversations with the contributor
that the
>>>>>>>> files need to be OK for a contribution. That's the reason
why the
>>>>>>>> test/dev data was not contributed since it had some CC license
that was
>>>>>>>> problematic.
>>>>>>> The other dev/test data doesn't seem problematic at all, but
the 90k names
>>>>>>> file seems non-trivial. If it were CC, the license would need
to be mentioned
>>>>>>> in a LICENSE.txt file. My suggestion would be to simply strip
the file down
>>>>>>> to the names needed for the example.
>>>>>> If I have to guess I'd say that the names have been crawled and that
>>>>>> there is no original source file with a specific license.
>>>>>> The novels had the CC license last time I checked. I do not remember
>>>>>> all, but when I looked it up in Apache's third party pages, it indicated
>>>>>> that it was not possible to include them. However, I could have been
>>>>>> Hmm... it depends what is needed for the example. The initial example
>>>>>> were 10-20 novels. I could strip it down to the firstnames of one
>>>>>> I remember to be part of the dev set, but is that really necessary?
>>>>> Let's see what Marshall thinks about it.
>>>>> -- Richard

View raw message