Thank you Dylan. The paper seems interesting. The abstract reads as follows
: "We consider the problem of doing fast and reliable estimation of the
number z of nonzero entries in a sparse boolean matrix product"...
I think they maybe doing this for boolean matrices. The code that I pointed
to in SystemML is used for all matrices.
I am not sure if and how their technique translates to nonboolean matrices.
Also, I am asking for an explanation of what is implemented as opposed to
what can be.
I was wondering if the neat looking formula has been gotten from some
literature somewhere?
Nakul
On Tue, Jun 20, 2017 at 1:42 PM, Dylan Hutchison <dhutchis@cs.washington.edu
> wrote:
> In case it is helpful, this 2010 paper
> <https://www.semanticscholar.org/paper/BetterSize
> EstimationforSparseMatrixProductsAmossenCampagna/
> 52f70406bb4d887e1b79af81a746184c5723848a>
> is my favorite for estimating the nnz of a matrix product with high
> confidence. The algorithm is much more involved than the simple SystemML
> calculation, because it looks at samples from the actual data.
>
> Here are some notes I have on the paper:
>
> There is a sketch algorithm [2] that gives a (1 + ε) approximation z̃ to z
> for any ε > 4 / (z^.25) in O(m)
> time and with a bound on the number of I/Os. In relational algebra, this is
> estimating
> π_{ik}( A(i, j) ⋈ B(j, k) ). The idea behind the algorithm is finding,
> for some tuning parameter
> k that varies with z, the ksmallest value of a hash function h(i; k). The
> larger this value is, the more
> is and ks that match for a given j. Repeat this for all js. Different (i,
> k)s contribute to different entries
> in the result matrix and have different hash values. A sketch for this
> algorithm can be incrementally
> maintained via independent samples on is and ks. Computing the z̃ estimate
> from the sample takes o(n)
> time for large enough z. The larger z is, the smaller a sample size we
> need. (Need large samples for
> small z.) (Everything above assumes the zero product property and
> zerosumfree).
>
>
>
> On Tue, Jun 20, 2017 at 12:06 PM, Nakul Jindal <nakul02@gmail.com> wrote:
>
> > Hi,
> >
> > There is code in systemml to “estimate” the output sparsity of a matrix
> > multiplication operation between two sparse matrices.
> > Is there a citation for this? If not, have we done any tests to figure
> out
> > how good these estimates are?
> > https://github.com/nakul02/systemml/blob/df8d4a63d8d09cae94b
> > 6ca2634e31da554302c72/src/main/java/org/apache/sysml/
> > hops/OptimizerUtils.java#L944L953
> >
> > Thanks,
> > Nakul
> >
>
