flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaborhermann <...@git.apache.org>
Subject [GitHub] flink issue #2819: [FLINK-4961] [ml] SGD for Matrix Factorization (WIP)
Date Wed, 23 Nov 2016 18:50:33 GMT
Github user gaborhermann commented on the issue:

    Hello Theodore,
    Thanks for taking a look at this PR!
    1. No problem. Some performance evaluations might help us in this case too. I don't believe
it should have much effect on the performance anyway as a 3-way join is only used outside
the iteration.
    2. I really like the idea of doing a performance evaluation! I'm not exactly sure how
to do this with a `join` instead of a `coGroup`, so let me sketch an implementation before
elaborating on the pros/cons. (It was more straightforward to use `coGroup`.)
    3. The benefit of using more blocks is to use less memory, just as in the ALS algorithm,
with the disadvantage of using more network. However, more blocks here means much more network
and bookkeeping compared to ALS, because there are more Flink iterations. I've investigated
this a bit more and found that the main "bottleneck" is the maximum size of the sort-buffer,
which was around 100 MB with around 40 GB TaskManager memory, and large matrix blocks do not
fit in. So even given enough memory, we cannot use large blocks. Unfortunately we cannot configure
the maximum size of the sort-buffer, and I would not like to change the underlying code in
the core or runtime, but I tried a workaround which might work (splitting the factor blocks
to subblocks). I'll just need to run some measurements. If this workaround turns out fine,
then it should be okay to go on with this and give instructions in the docs.
    4. Okay, I agree.
    I hope we could generalize this non-overlapping blocking to other optimization problems
:) I'll take a look at the paper you linked!

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message