Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "SpMV" page has been changed by Mikalai Parafeniuk:
http://wiki.apache.org/hama/SpMV?action=diff&rev1=9&rev2=10
== Distributed Sparse MatrixVector Multiplication on Hama ==
=== Introduction ===
 In further description we will research problem in form u = Av. Most computational algoritms
spends large percent of time for solving large systems of linear equations. In general, system
of linear equations can be represented in matrix form Ax = b, where A is matrix with n rows
and n columns, b  vector of size n, x  unknown solution vector which we are searching. Some
approaches for solving linear systems has iterative nature. Assume, we know the initial approximation
of x = x0. After that we represent our system in form xn = Bxn1 + c, where c  vector of
size n. After that we have to found next approximations of x till the convergence. In real
world most of matrices contain relatively small number of nonzero items in comparison to
total number of matrix items. Such matrices are called sparse matrices, matrices which filled
most with nonzero items are called dense. Sparse matrices arise when each variable from the
set is connected with small subset of variables (for example, differential equation of heat
conduction). So, this page will describe the problem of sparse matrix vector multiplication(SpMV)
with use of Bulk Synchronous Programming(BSP) model implemented in Apache Hama project. As
shown above, SpMV can be used in different iterative solvers for system of linear equations.
+ In this article we will explore the problem of sparse matrixvector multiplication, which
can be written in form u = Av. Most computational algoritms spends large percent of time for
solving large systems of linear equations. In general, system of linear equations can be represented
in matrix form Ax = b, where A is matrix with n rows and n columns, b  vector of size n,
x  unknown solution vector which we are searching. Some approaches for solving linear systems
has iterative nature. Assume, we know the initial approximation of x = x0. After that we represent
our system in form xn = Bxn1 + c, where c  vector of size n. After that we can found next
approximations of x and repeat this till the convergence. In real world most of matrices contain
relatively small number of nonzero items in comparison to total number of matrix items. Such
matrices are called sparse matrices, matrices which filled most with nonzero items are called
dense. Sparse matrices arise when each variable from the set is connected with small subset
of variables (for example, differential equation of heat conduction). So, this page will describe
the problem of sparse matrix vector multiplication (SpMV) with use of Bulk Synchronous Programming
(BSP) model implemented in Apache Hama project. As shown above, SpMV can be used in different
iterative solvers for system of linear equations.
Bulk Synchronous model proposes it's own smart way of parallelization of programs. We can
specify input path for problem and number of peers. Framework reads the input and divides
it between peers. Peers can be a processors, threads, separate machines, different items of
cloud. BSP algorithm is divided in sequence of supersteps. Barrier synchronization of all
peers is made after each superstep. The implementation of BSP(Apache Hama) contains primitives
for defining peer number, communication with other peers with different communication primitives,
optimizations of communication between peers, also it inherits most of features and approaches
of Hadoop project.
=== Problem description ===
 As a sequential problem SpMV is almost trivial problem. But in case of parallel version
we should think about some additional aspects:
+ As a sequential problem SpMV is almost trivial. But in case of parallel version we should
think about some additional aspects:
 1. Partitioning of matrix and vector components. This means that we should split the input
matrix and vectors by peers, if we want to have benefits from usage of parallel algorithm.
Wise partitioning should be taken or communication time will rise very much or we will get
great load imbalance and algorithm will be inefficient.
+ 1. Partitioning of matrix and vector components. This means, that we should split the input
matrix and vectors by peers, if we want to have benefits from usage of parallel algorithm.
Wise partitioning should be made or data locality won't be approached and communication time
will rise very much or we will get great load imbalance and algorithm will be inefficient.
2. Load balancing. This means that each peer must perform nearly the same amount of work,
and none of them should idle.
3. We must consider Hadoop and Hama approach for parallelization.
=== Implementation tips ===
 1. Framework splits the input file to peers automatically. So we don't need to perform
mapping of matrix to peers manually. We only must define how matrix can be written to file
and how it can be readed from it. If we create matrix, which consists from separate cells,
framework will give some subset of cells to each peer. If we create matrix consisting from
rows, framework will give subset of rows to each peer. The ways to influence on partitioning:
creating different writables for matrices as described above, overriding default partitioner
class behavior.
+ 1. Framework splits the input file to peers automatically. So we don't need to perform
mapping of matrix to peers manually. We only must define how matrix can be written to file
and how it can be readed from it. If we create matrix, which consists from separate cells,
framework will give some subset of cells to each peer. If we create matrix consisting from
rows, framework will give subset of rows to each peer. The ways to influence on partitioning:
creating different writables for matrices, overriding default partitioner class behavior.
2. We don't need to care about communication in case of rowwise matrix access. First of
all, rows of matrix are splitted automatically by the framework. After that we can compute
inner product of the vector and concrete matrix row, and the result can be directly printed
to output, because it is one of the cells of result vector. In this case we assume, that peer's
memory can fit two vectors. Even if we have million x million matrix and vector of size million,
some megabytes will be enough to store them. Even if we split input vector the gain in memory
will be insignificant.
=== Algorithm description ===
@@ 22, +22 @@
1. Custom partitioning.
2. Local computation.
3. Output of result vector.
+ 4. Constructing of dense vector.
 In setup stage every peer reads input dense vector from file. After that, framework will
partition matrix rows by the algorithm provided in custom partitioner automatically. After
that local computation is performed. We gain some cells of result vector, and they are written
to output file.
+ In setup stage every peer reads input dense vector from file. After that, framework will
partition matrix rows by the algorithm provided in custom partitioner automatically. After
that local computation is performed. We gain some cells of result vector in bsp procedure,
and they are written to output file. Output file is reread to construct instance of dense
vector for further computation.
+
+ === Implementation ===
+ Implementation can be found in my GitHub repository [[https://github.com/ParafeniukMikalaj/spmv]]
and patch can be found in [[https://issues.apache.org/jira/browse/HAMA524Apache JIRA]] as
soon as JIRA will become available. GitHub repository contains only classes related to SpMV.
I considered two possible use cases of SpMV:
+ 1. Usage in pair with `RandomMatrixGenerator`.
+ 2. Usage with arbitrary text files.
+ In this section you will see how to use SpMV in this two cases. NOTE: currently SpMV is
buggy, so output can be not correct.
+ ==== Usage with RandomMatrixGenerator ====
+ `RandomMatrixGenerator` as a `SpMV` works with sequence file format. So, to multiply random
matrix with random vector we will do the following: generate matrix and vector; convert matrix,
vector and result to text file; view matrix, vector and result. This sequence is described
by the following code snippet:
+ {{{
+ 1: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar rmgenerator /user/hduser/spmv/matrixseq
4 4 0.3 4
+ 2: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar rmgenerator /user/hduser/spmv/vectorseq
1 4 0.8 1
+ 3: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar spmv /user/hduser/spmv/matrixseq /user/hduser/spmv/vectorseq
/user/hduser/spmv/resultseq 4
+ 4: ../hadoop/bin/hadoop dfs rmr /user/hduser/spmv/resultseq/part
+ 5: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar matrixtotext /user/hduser/spmv/matrixseq
/user/hduser/spmv/matrixtxt
+ 6: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar matrixtotext /user/hduser/spmv/vectorseq
/user/hduser/spmv/vectortxt
+ 7: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar matrixtotext /user/hduser/spmv/resultseq
/user/hduser/spmv/resulttxt
+ 8: ../hadoop/bin/hadoop dfs cat /user/hduser/spmv/matrixtxt/*
+ 0 4 1 2 0.09775514911904559
+ 3 4 1 1 0.22718006778335464
+ 1 4 1 1 0.796916052801057
+ 3 4 1 2 0.9680719390036476
+ 2 4 1 0 0.44269679525022
+ 9: ../hadoop/bin/hadoop dfs cat /user/hduser/spmv/vectortxt/*
+ 0 4 4 0 0.3111172930766064 1 0.27242674637140374 2 0.11394746764097863 3 0.0
+ 10: ../hadoop/bin/hadoop dfs cat /user/hduser/spmv/resulttxt/*
+ 0 5 5 0 0.011138951690981488 1 0.21710124739573375 2 0.1377306285919371 3 0.11030934594375758
4 0.0
+ }}}
+ Line 12: Generation of input matrix and vector.<<BR>>
+ Line 3: SpMV algorithm.<<BR>>
+ Line 4: Deletion of part files from output directory at line 4. NOTE: `matrixtotext` will
fail if this step will not be performed, because `resultseq` will containg part folder and
`matrixtotext` don't know how to deal with it yet.<<BR>>
+ Line 57: Convertion of input matrix, input vector and result to text format.<<BR>>
+ Line 810: Showing the result. NOTE: currently SpMV is buggy  you can see, that output
vector has length of 5, and the last cell of vector has incorrect value. This will be fixed
soon.
+
+ ==== Usage with arbitrary text files ====
+ SpMV works with `SequenceFile`, so we need to provide tools to convert input and output
of SpMV between sequence file format and text format. These tools are `matrixtoseq` and `matrixtotext`.
This programs are included in example driver, so they can be launched like any other example.
`matrixtoseq` converts matrix, represented in text file to sequence file format. Also this
program gives choice to choose target writable: `DenseVectorWritable` and `SparseVectorWritable`.
+ {{{
+ Usage: matrixtoseq <input matrix dir> <output matrix dir> <densesparse>
[number of tasks (default max)]
+ }}}
+ `matrixtotext` converts matrix from sequence file format to text file.
+ {{{
+ Usage: matrixtotext <input matrix dir> <output matrix dir> [number of tasks
(default max)]
+ }}}
+ To use SpMV in this mode you should provide text files in appropriate format. I decided
to represent all matrices and vectors as follows: each row of the matrix is represented by
row index, length of the row, number of nonzero items, pairs of index and value. All values
inside rows are separated by whitespace, rows are separated by newline. Vectors are represented
as matrix rows with arbitrary row index(not used). So, for example:
+ {{{
+ [1 0 2] 3 2 0 1 2 2
+ [0 0 0] = 3 0
+ [0 5 1] 3 2 1 5 2 1
+ }}}
+ Now let's show some example. Imagine that you need to multiply
+ {{{
+ [1 0 6 0] [2] [38]
+ [0 4 0 0] * [3] = [12]
+ [0 2 3 0] [6] [24]
+ [3 0 0 5] [0] [6]
+ }}}
+ First of all, you should create appropriate text files for input matrix and input vector.
For input matrix file should look like
+ {{{
+ 0 4 2 0 1 2 6
+ 1 4 1 1 4
+ 2 4 2 1 2 2 3
+ 3 4 2 0 3 3 5
+ }}}
+ For vector file should be look like
+ {{{
+ 0 4 3 0 2 1 3 2 6
+ }}}
+ After that you should copy these files to HDFS. If you don't feel comfortable with HDFS
please see [[http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.htmlthis tutorial]].
I propose the following directory structure for this example
+ {{{
+ /user/hduser/spmv/matrixseq
+ /user/hduser/spmv/matrixtxt
+ /user/hduser/spmv/resultseq
+ /user/hduser/spmv/resulttxt
+ /user/hduser/spmv/vectorseq
+ /user/hduser/spmv/vectortxt
+ }}}
+ Suffix `seq` denotes that directory contains sequence files. Suffix `txt` denotes that directory
contains humanreadable text files in format, described above. After you have copied input
matrix into `matrixtxt` and input vector into `vectortxt`, we are ready to start. The following
code snippet shows, how you can multiply matrices in this mode. Explanations will be given
below.
+ {{{
+ 1: jar hamaexamples0.6.0SNAPSHOT.jar matrixtoseq /user/hduser/spmv/matrixtxt /user/hduser/spmv/matrixseq
sparse 4
+ 2: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar matrixtoseq /user/hduser/spmv/vectortxt
/user/hduser/spmv/vectorseq dense 4
+ 3: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar spmv /user/hduser/spmv/matrixseq /user/hduser/spmv/vectorseq
/user/hduser/spmv/resultseq 4
+ 4: ../hadoop/bin/hadoop dfs rmr /user/hduser/spmv/resultseq/part
+ 5: bin/hama jar hamaexamples0.6.0SNAPSHOT.jar matrixtotext /user/hduser/spmv/resultseq
/user/hduser/spmv/resulttxt
+ 6: ../hadoop/bin/hadoop dfs cat /user/hduser/spmv/resulttxt/part00000
+ 0 4 4 0 38.0 1 12.0 2 24.0 3 6.0
+ }}}
+ Line 1: Converting input matrix to sequence file format, internally consisting of `SparseVectorWritable`.<<BR>>
+ Line 2: Converting input vector to sequence file format, internally consisting of `DenseVectorWritable`.<<BR>>
+ Line 3: SpMV algorithm.<<BR>>
+ Line 4: We delete part files from output directory. NOTE: `matrixtotext` will fail if this
step will not be performed, because resultseq will containg part folder and matrixtotext
don't know how to deal with it yet.<<BR>>
+ Line 5: Convertion of result vector to text format.<<BR>>
+ Line 6: Output of result vector. You can see that we gained an expected vector.<<BR>>
+
=== Possible improvements ===
+ 1. Bug fixing. My main aim now  provide stable work of SpMV.
 1. Significant improvement in total time of algorithm can be achieved by creating custom
partitioner class. It will give us load balancing and therefore better efficiency. This is
the main possibility for optimization, because we decided, that using of rowwise matrix access
i acceptable. Maybe it can be achieved by reordering of input or by customizing partitioning
algorithm of framework.
+ 2. Significant improvement in total time of algorithm can be achieved by creating custom
partitioner class. It will give us load balancing and therefore better efficiency. This is
the main possibility for optimization, because we decided, that using of rowwise matrix access
i acceptable. Maybe it can be achieved by reordering of input or by customizing partitioning
algorithm of framework.
=== Literature ===
1. Rob H. Bisseling  Parallel Scientific computation. (chapter 4).
