flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2452) Add a playcount threshold to the MusicProfiles example
Date Fri, 31 Jul 2015 23:43:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650010#comment-14650010
] 

ASF GitHub Bot commented on FLINK-2452:
---------------------------------------

GitHub user vasia opened a pull request:

    https://github.com/apache/flink/pull/968

    [FLINK-2452] [Gelly] adds a playcount threshold to the MusicProfiles example

    This PR adds a user-defined parameter to the MusicProfiles example that filters out songs
that a user has listened to only a few times. Essentially, it is a threshold for playcount,
above which a user is considered to like a song.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vasia/flink music-profiles

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #968
    
----
commit c0c8463521912d021c392c2c5edc254fee267eb8
Author: vasia <vasia@apache.org>
Date:   2015-07-31T20:12:18Z

    [FLINK-2452] [Gelly] adds a playcount threshold to the MusicProfiles example

----


> Add a playcount threshold to the MusicProfiles example
> ------------------------------------------------------
>
>                 Key: FLINK-2452
>                 URL: https://issues.apache.org/jira/browse/FLINK-2452
>             Project: Flink
>          Issue Type: Improvement
>          Components: Gelly
>    Affects Versions: 0.10
>            Reporter: Vasia Kalavri
>            Assignee: Vasia Kalavri
>            Priority: Minor
>
> In the MusicProfiles example, when creating the user-user similarity graph, an edge is
created between any 2 users that have listened to the same song (even if once). Depending
on the input data, this might produce a projection graph with many more edges than the original
user-song graph.
> To make this computation more efficient, this issue proposes adding a user-defined parameter
that filters out songs that a user has listened to only a few times. Essentially, it is a
threshold for playcount, above which a user is considered to like a song.
> For reference, with a threshold value of 30, the whole Last.fm dataset is analyzed on
my laptop in a few minutes, while no threshold results in a runtime of several hours.
> There are many solutions to this problem, but since this is just an example (not a library
method), I think that keeping it simple is important.
> Thanks to [~andralungu] for spotting the inefficiency!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message