Saturday, April 17, 2010

Parallelism for Matrix Multiplication

Introduction
In this article we have generated some smalls tests on Parallelism using Dot Net Framework 4.0.  Parallelism is basically the concept of executing two or more tasks in parallel/concurrently, and thus having better performance.  To achieve parallelism two or more cores are required.  With having only a single core, parallelism can only be achieved virtually, through the use of context switches, though performance improvement will not be achieved.  The construct used for parallelism to produce the below tests, is documented below.


Construct Details
Summary: Executes a for loop in which iterations may run in parallel.
Parameters
  • fromInclusive: The start index, inclusive.
  • toExclusive: The end index, exclusive.
  • body: The delegate that is invoked once per iteration.

Returns: A System.Threading.Tasks.ParallelLoopResult structure that contains information on what portion of the loop completed.
Exceptions
  • System.ArgumentNullException: The exception that is thrown when the body argument is null.
  • System.AggregateException: The exception that is thrown to contain an exception thrown from one of the specified delegates.

public static ParallelLoopResult For(int fromInclusive, int toExclusive, Action body);

Methodology
The test performed was a simple squared matrix multiplication and taking timing benchmarks on multiplying two matrices using different matrix dimensions.  The tests where performed using:
  • standard sequential method,
  • parallelism on the inner loop, 
  • parallelism on both the inner loop, and 
  • parallelism on the outer loop.

The machine used has the following specifications: 
Intel(R) Core(TM)2 CPU 6400 @ 2.16, 2.14 GHz, 1.00GB of RAM


Results
These are the results achieved (time shown is in seconds):


                         10          100         1000        2000
Sequential        0.0         0.02        38.08       309.58
Inner Loop       0.0         0.05        28.80       242.55
Both Loops      0.0         0.03        25.03       204.83
Outer Loop      0.0         0.03        20.98       170.25




Conclusions
As per the results produced, it can be concluded that the best performance was achieved using parallelism on the outer loop.  This should provide better results when applied on a hardware having more cores to aid in executing more tasks in parallel.


Code
Below is the code for all the mentioned above methods using C#:


public void SequentialMatrixMultiplication(int size, double[,] m1, double[,] m2, double[,] result)
{
    for (int i = 0; i < size; i++)
    {
        for (int j = 0; j < size; j++)
        {
            result[i, j] = 0;
            for (int k = 0; k < size; k++)
            {
                result[i, j] += m1[i, k] * m2[k, j];
            }
        }
    }
}


public void ParallelMatrixMultiplicationOuterLoop(int size, double[,] m1, double[,] m2, double[,] result)
{
    Parallel.For(0, size, delegate(int i)
    {
        for (int j = 0; j < size; j++)
        {
            result[i, j] = 0;
            for (int k = 0; k < size; k++)
            {
                result[i, j] += m1[i, k] * m2[k, j];
            }
        }
    });
}


public void ParallelMatrixMultiplicationBothLoops(int size, double[,] m1, double[,] m2, double[,] result)
{
    Parallel.For(0, size, delegate(int i)
    {
        Parallel.For(0, size, delegate(int j)
        {
            result[i, j] = 0;
            for (int k = 0; k < size; k++)
            {
                result[i, j] += m1[i, k] * m2[k, j];
            }
        });
    });
}

public void ParallelMatrixMultiplicationInnerLoop(int size, double[,] m1, double[,] m2, double[,] result)
{
    for (int i = 0; i < size; i++)
    {
        Parallel.For(0, size, delegate(int j)
        {
            result[i, j] = 0;
            for (int k = 0; k < size; k++)
            {
                result[i, j] += m1[i, k] * m2[k, j];
            }
        });
    }
}

No comments: