commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Waeselynck <valentinwaesely...@yahoo.fr>
Subject Re: [Laboratory Toolkit] proposing a new Apache Commons component
Date Wed, 18 Dec 2013 00:03:10 GMT
Hello to all,

As you asked me, I have changed the structure to make the Laboratory Toolkit a Maven project,
and added some code samples to show its use cases. (Sorry for the delay, I've had a rough
couple of weeks).

In the code samples, you may find the following examples :
    - accounting : the simplest example, in the field of enterprise finance. It's an 
application that takes as an input the accounting documents of a company (the Balance Sheet
and the Income Statement), and calculates from these a variety of financial quantities, such
as the Net Income, and profitability ratios, such as the Return On Equity.

    - integer : a more mathematical example, in the fields of arithmetics and algebra.
The base data is simpy a positive integer; the application computes things like the set of
divisors of this integer, then some more advanced algebra objects such as the Ring of modulos
of this integer, its canonical Chinese factorization and the isomorphism between them, up
to its a set of generators of its Group of Invertibles. 

    - search-engine : implements a basic search-engine on a corpus of documents, using
classical ranking functions such as BM25 or TF-IDF in a Vector Space Model. That one is closer
to reality, it's directly inspired from an "Introduction to Big Data" class I just had.
    - text : illustrates the HTML guide in the repository, don't look at this directly.
 
Of course, these are only toy examples, and I don't have the ambition of replacing software
that already does this very well; but I hope they're informative enough about this API's genericity
and possible applications. 

In the real world, this API is originated from my developing an application that generates
advertisements snippets from HTML product pages, in which I used this toolkit extensively
for extracting and ranking keywords from the HTML document; but I can't show you that code.
My experience with this is that I find it much easier to look at a sequence of formulas on
a paper, and implement them one by one.

In my opinion, the main features of this API are :
    - making some algorithms easier to develop by expressing their concepts in terms of
analyses and results (as an analogy, think of how the Executor API lets you describe concurrent
algorithms in terms of tasks and executors)
    - built-in support for the Intercept, Cache and Invoke pattern
    - enforcing a modular architecture without hindering the communication between the
modules (i.e the Analysis objects)
    - through the use of Laboratory objects, emulating a new scope that is an alternative
to class scope or method scope.
    - separating the concerns of declaring of the steps of an algorithm are computed, and
externally requesting their results.
    - encouraging the exploration of a space of strategies and parameters for the algorithms,
by concentrating all these parameters in one place (the Equipment object).

I hope you'll like it, and I'm always eager for feedback!

With best wishes,


Valentin WAESELYNCK
Étudiant en 3° année à l'École Polytechnique
valentin.waeselynck@polytechnique.edu
+33 6 80 84 99 54




Le Vendredi 6 décembre 2013 14h21, Benedikt Ritter <britter@apache.org> a écrit :
 
2013/12/5 Christian Grobmeier <grobmeier@gmail.com>

> On 5 Dec 2013, at 13:44, Valentin Waeselynck wrote:
>
>  Should I keep answering to the whole ML about this, or only to you?
>>
>
> Keep the mailing list in loop. There might be others interested in this.
> In addition ml do document history which is why we always use the ml.


Thanks for chiming in on this, Christian!

Valentin: Before you invest a lot of work to get maven and some tests in
place, let us start with the example code, so that people can decide
 if
your projects fits into commons.

Benedikt


>
>
>
>
>
>
>> Best regards,
>>
>>
>> Valentin WAESELYNCK
>> Étudiant en 3° année à l'École Polytechnique
>> valentin.waeselynck@polytechnique.edu
>> +33 6 80 84 99 54
>>
>>
>>
>>
>> Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <britter@apache.org> a
>> écrit :
>>
>> Bonjour Valentin,
>>
>> welcome to the ML. Good to hear that you've decided to join the open
>> source
>> movement.
>>
>> First of all, it would really help, if you could elaborate some use cases
>> for your library. You're talking about building algorithms. What kind of
>> algorithms can be build with Laboratory Toolkit? Can you give some code
>> examples (just create some gists at github that show the the use of
>> Laboratory Toolkit)?
>>
>> There is an important requirement for any code to be incorporated into the
>> Apache
 code base:
>> - the interlectual property (IP) of the code has to be owned completely by
>> the contributor. You said, that you've build the Laboratory Toolkit for a
>> research project. Are you sure that you own the code? Or is it the result
>> of your work and thus is owned by your employer?
>>
>> At commons we have some additinal requirements:
>> - There should be a group of people who is willing to maintain the code
>> - Commons components should in general not depend on any other libraries
>> - Commons uses maven as the main build tool, so there should be a maven
>> build available
>> - The code should have a good test coverage
>>
>> You have to figure the IP issue
 out on your own first.
>> After that, if the community decides to accept this contribution, we can
>> work on the commons requirements.
>>
>> Best regards and thank you,
>> Benedikt
>>
>>
>>
>> 2013/12/4 Valentin Waeselynck <valentinwaeselynck@yahoo.fr>
>>
>>    Hello to all,
>>>
>>> As part of a small research project (which combined techniques of
>>> text-mining, machine-learning and natural language generation, not that
>>> it's really relevant) I have come to design a small JavaSE library,
 which
>>> I'm for the moment calling the Laboratory Toolkit, for developing our
>>> algorithms in a comfortable and flexible manner.
>>>
>>> I have found it to be quite generic and reusable, not tied to any
>>> application domain, while still being rather accessible, and small enough
>>> to comprehend it easily. Therefore, I would like to propose it as a new
>>> Apache Commons component. I would be very grateful if one of you could
>>> tell me what steps I should follow for that purpose.
>>>
>>> I have uploaded it on Github :
>>> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may
>>> find the sources, the javadoc, and a small guide I have started to write
>>> for it (also attached to this mail).
>>>
>>> Of course, I am very open to feedback and criticism on your behalf. The
>>> last thing I want is to publish an immature or useless component; nor do
>>> I
>>> take a positive answer from you for granted.
>>>
>>> If I have failed to follow the proper procedure to propose a new
>>> candidate
>>> component, it is not on purpose, and I apologize in advance.
>>>
>>> Whatever your reply, and since I have the chance, I would also like to
>>> congratulate you for all your
 work. The Apache Commons components have
>>> really been lifesavers to me, on many occasions.
>>>
>>> With best wishes,
>>>
>>> Valentin WAESELYNCK
>>> Étudiant en 3° année à l'École Polytechnique
>>> valentin.waeselynck@polytechnique.edu
>>> +33 6 80 84 99 54
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> --
>> http://people.apache.org/~britter/
>> http://www.systemoutprintln.de/
>> http://twitter.com/BenediktRitter
>> http://github.com/britter
>>
>
>
> ---
> http://www.grobmeier.de
> @grobmeier
> GPG: 0xA5CC90DB

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message