flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rbraeunlich <...@git.apache.org>
Subject [GitHub] flink pull request: basic TfidfTransformer
Date Tue, 26 May 2015 21:48:44 GMT
GitHub user rbraeunlich opened a pull request:


    basic TfidfTransformer

    Hi everybody,
    due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we created a first
implementation of a TfIdfTranformer.
    There is still one problem left, because using modulo after the hashing causes collisions.
    Nevertheless, we would be glad to receive some comments to our implementation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rbraeunlich/flink tfidf

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #730
commit 9e9ac219b619ddfbab4f616165d038900b7726db
Author: Ronny Bräunlich <r.braeunlich@gmail.com>
Date:   2015-05-15T09:18:00Z

    create TfIdfTransformer

commit 42ef7c00a832e21d7391e1011031bda162d930f1
Author: Ronny Bräunlich <r.braeunlich@gmail.com>
Date:   2015-05-16T14:38:28Z

    fix import in TfIdfTranformer and add first basic test case

commit 82385b764f45f955cd88590b7657467689d096ed
Author: Ronny Bräunlich <r.braeunlich@gmail.com>
Date:   2015-05-15T09:18:00Z

    create TfIdfTransformer and add first basic test case

commit 7242728b1c24027203f1ff91476de9acb9bbf3a7
Author: diva1012 <vsldimov@gmail.com>
Date:   2015-05-17T11:42:40Z

    Changes merged
    Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf

commit 9c2c181624bb81f3ed83a4a774339251508644f1
Author: diva1012 <vsldimov@gmail.com>
Date:   2015-05-17T17:40:00Z

    Small fix of the test class. (The Sparse vector contains index -> value tuples, so
we have to take only the value and not the whole tuple for the comparisson)

commit 8b17385e34b7f139a2649f80edc81744277fcfae
Author: diva1012 <vsldimov@gmail.com>
Date:   2015-05-18T06:41:58Z

    Word count implementation simplified.

commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9
Author: diva1012 <vsldimov@gmail.com>
Date:   2015-05-18T11:35:43Z

    TF calculation fixed

commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08
Author: diva1012 <vsldimov@gmail.com>
Date:   2015-05-19T20:44:31Z

    Transformer improved: now we get SparseVector for each document that contains all words.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message