systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <mboe...@gmail.com>
Subject Re: distributed cholesky on systemml
Date Sun, 22 Apr 2018 06:39:11 GMT
just as a quick update: this issue has now been fixed in SystemML
master - it was essentially a missing guard for recursive functions
when checking for unary size-preserving functions during
inter-procedural analysis (IPA).

However, while working with this recursive cholesky function I came to
the conclusion that it may need some rework. The current top-down,
depth-first, approach is inherently sequential. This is partially
unnecessary because for the used recursive function U_triangular_inv
(which is called many more times than cholesky), blocks per level are
independent. Therefore, we should look into a bottom-up, breadth-first
approach to parallelize over the blocks in each level, which could be
done via parfor at script level.

Regards,
Matthias

On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <mboehm7@gmail.com> wrote:
> thanks for catching this - I just ran a toy example and this seems to
> be a rewrite issue (there are specific right indexing rewrites that
> collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U which
> helps for large distributed matrices). As a workaround, you can set
> "sysml.optlevel" to 1 (instead of default 2, where 1 disables all
> rewrites), which worked fine for me. I'll fix this later today. Also
> I'll fix the naming from "Choleskey" to "Cholesky". Thanks again.
>
> Regards,
> Matthias
>
>
> On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <qifan.pu@gmail.com> wrote:
>> Hi Matthias,
>>
>> Thanks for the fast response and detailed information. This is really
>> helpful.
>>
>> I just tried to run it, and was tracing down a indexing bug that can be
>> repeated by simply running the test script of triangle solve[1]
>> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid values for
>> matrix indexing: [1667:3333,1:1666] must be within matrix dimensions
>> [1000,1000]
>>
>>
>> Am I missing some configuration here?
>>
>>
>> [1]
>> https://github.com/apache/systemml/blob/master/scripts/staging/scalable_linalg/test/test_triangular_inv.dml
>>
>>
>> Best,
>> Qifan
>>
>>
>> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm <mboehm7@gmail.com> wrote:
>>>
>>> Hi Qifan,
>>>
>>> thanks for your feedback. You're right, the builtin functions
>>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently only
>>> supported as single-node operations because they're still implemented
>>> via Apache commons.math.
>>>
>>> However, there is an experimental script for distributed cholesky [1]
>>> which uses a recursive approach (with operations that allow for
>>> automatic distributed computation) for matrices larger than a
>>> user-defined block size. Once blocks become small enough, we use again
>>> the builtin cholesky. Graduating this script would require a broader
>>> set of experiments (and potential improvements) but it simply did not
>>> have the highest priority so far. You might want to give it a try
>>> though.
>>>
>>> Thanks again for your feedback - we'll consider a higher priority for
>>> these distributed operations when discussing the roadmap for the next
>>> releases.
>>>
>>> [1]
>>> https://github.com/apache/systemml/blob/master/scripts/staging/scalable_linalg/cholesky.dml
>>>
>>> Regards,
>>> Matthias
>>>
>>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <qifan.pu@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I would love to do distributed cholesky on large matrix with SystemML. I
>>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163), but AFAIK, this
>>> > is
>>> > currently not implemented? I just wanted to check.
>>> >
>>> > Best,
>>> > Qifan
>>
>>

Mime
View raw message