flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4613) Extend ALS to handle implicit feedback datasets
Date Fri, 23 Sep 2016 14:18:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516571#comment-15516571
] 

ASF GitHub Bot commented on FLINK-4613:
---------------------------------------

Github user gaborhermann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2542#discussion_r80253321
  
    --- Diff: docs/dev/libs/ml/als.md ---
    @@ -49,6 +49,18 @@ By applying this step alternately to the matrices $U$ and $V$, we can
iterativel
     
     The matrix $R$ is given in its sparse representation as a tuple of $(i, j, r)$ where
$i$ denotes the row index, $j$ the column index and $r$ is the matrix value at position $(i,j)$.
     
    +An alternative model can be used for _implicit feedback_ datasets.
    +These datasets only contain implicit feedback from the user
    +in contrast to datasets with explicit feedback like movie ratings.
    +For example users watch videos on a website and the website monitors which user
    +viewed which video, so the users only provide their preference implicitly.
    +In these cases the feedback should not be treated as a
    +rating, but rather an evidence that the user prefers that item.
    +Thus, for implicit feedback datasets there is a slightly different
    +minimalization problem to solve (see [Hu et al.](http://dx.doi.org/10.1109/ICDM.2008.22)
for details).
    +Flink supports both explicit and implicit ALS,
    +and the choice between the two can be set in the parameters.
    +
    --- End diff --
    
    Okay, I added
    "The implementation is based on the Apache Spark implementation of implicit ALS."
    and referred to the relevant file in the Spark codebase.



> Extend ALS to handle implicit feedback datasets
> -----------------------------------------------
>
>                 Key: FLINK-4613
>                 URL: https://issues.apache.org/jira/browse/FLINK-4613
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> The Alternating Least Squares implementation should be extended to handle _implicit feedback_
datasets. These datasets do not contain explicit ratings by users, they are rather built by
collecting user behavior (e.g. user listened to artist X for Y minutes), and they require
a slightly different optimization objective. See details by [Hu et al|http://dx.doi.org/10.1109/ICDM.2008.22].
> We do not need to modify much in the original ALS algorithm. See [Spark ALS implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
which could be a basis for this extension. Only the updating factor part is modified, and
most of the changes are in the local parts of the algorithm (i.e. UDFs). In fact, the only
modification that is not local, is precomputing a matrix product Y^T * Y and broadcasting
it to all the nodes, which we can do with broadcast DataSets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message