{
"metadata": {
"name": "",
"signature": "sha256:cfaec7dd88cb8bdc337de9b007139ebaf49f66a7b8f67632f795eb742e0fde10"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A beginners guide to using Python for performance computing\n",
"\n",
"TAGS: Performance\n",
"\n",
"A comparison of weave with NumPy, Pyrex, Psyco, Fortran (77 and 90) and C++\n",
"for solving Laplace's equation. This article was originally written by Prabhu\n",
"Ramachandran.\n",
"\n",
"[laplace.py](files/attachments/PerformancePython/laplace.py) \n",
"is the complete Python code discussed below. The source tarball (\n",
"[perfpy_2.tgz](files/attachments/PerformancePython/perfpy_2.tgz) ) contains in addition the Fortran code, the pure C++ code, the\n",
"Pyrex sources and a setup.py script to build the f2py and Pyrex module.\n",
"\n",
"## Introduction\n",
"\n",
"This is a simple introductory document to using Python for performance\n",
"computing. We'll use NumPy, SciPy's weave (using both weave.blitz and\n",
"weave.inline) and Pyrex. We will also show how to use f2py to wrap a Fortran\n",
"subroutine and call it from within Python.\n",
"\n",
"We will also use this opportunity to benchmark the various ways to solve a\n",
"particular numerical problem in Python and compare them to an implementation\n",
"of the algorithm in C++.\n",
"\n",
"## Problem description\n",
"\n",
"The example we will consider is a very simple (read, trivial) case of solving\n",
"the 2D Laplace equation using an iterative finite difference scheme (four\n",
"point averaging, Gauss-Seidel or Gauss-Jordan). The formal specification of\n",
"the problem is as follows. We are required to solve for some unknown function\n",
"u(x,y) such that \u22072u = 0 with a boundary condition specified. For convenience\n",
"the domain of interest is considered to be a rectangle and the boundary values\n",
"at the sides of this rectangle are given.\n",
"\n",
"It can be shown that this problem can be solved using a simple four point\n",
"averaging scheme as follows. Discretise the domain into an (nx x ny) grid of\n",
"points. Then the function u can be represented as a 2 dimensional array -\n",
"u(nx, ny). The values of u along the sides of the rectangle are given. The\n",
"solution can be obtained by iterating in the following manner.\n",
"\n",
" \n",
" \n",
" for i in range(1, nx-1):\n",
" for j in range(1, ny-1):\n",
" u[i,j] = ((u[i-1, j] + u[i+1, j])*dy**2 +\n",
" (u[i, j-1] + u[i, j+1])*dx**2)/(2.0*(dx**2 + dy**2))\n",
" \n",
"\n",
"Where dx and dy are the lengths along the x and y axis of the discretised\n",
"domain.\n",
"\n",
"## Numerical Solution\n",
"\n",
"Implementing a solver for this is straight forward in Pure Python. Use a\n",
"simple NumPy array to store the solution matrix u. The following code\n",
"demonstrates a simple solver.\n",
"\n",
" \n",
" \n",
" import numpy\n",
" \n",
" class Grid:\n",
" \"\"\"A simple grid class that stores the details and solution of the\n",
" computational grid.\"\"\"\n",
" def __init__(self, nx=10, ny=10, xmin=0.0, xmax=1.0,\n",
" ymin=0.0, ymax=1.0):\n",
" self.xmin, self.xmax, self.ymin, self.ymax = xmin, xmax, ymin, ymax\n",
" self.dx = float(xmax-xmin)/(nx-1)\n",
" self.dy = float(ymax-ymin)/(ny-1)\n",
" self.u = numpy.zeros((nx, ny), 'd')\n",
" # used to compute the change in solution in some of the methods.\n",
" self.old_u = self.u.copy()\n",
" \n",
" def setBCFunc(self, func):\n",
" \"\"\"Sets the BC given a function of two variables.\"\"\"\n",
" xmin, ymin = self.xmin, self.ymin\n",
" xmax, ymax = self.xmax, self.ymax\n",
" x = numpy.arange(xmin, xmax + self.dx*0.5, self.dx)\n",
" y = numpy.arange(ymin, ymax + self.dy*0.5, self.dy)\n",
" self.u[0 ,:] = func(xmin,y)\n",
" self.u[-1,:] = func(xmax,y)\n",
" self.u[:, 0] = func(x,ymin)\n",
" self.u[:,-1] = func(x,ymax)\n",
" \n",
" def computeError(self):\n",
" \"\"\"Computes absolute error using an L2 norm for the solution.\n",
" This requires that self.u and self.old_u must be appropriately\n",
" setup.\"\"\"\n",
" v = (self.u - self.old_u).flat\n",
" return numpy.sqrt(numpy.dot(v,v))\n",
" \n",
" \n",
" class LaplaceSolver:\n",
" \"\"\"A simple Laplacian solver that can use different schemes to\n",
" solve the problem.\"\"\"\n",
" def __init__(self, grid, stepper='numeric'):\n",
" self.grid = grid\n",
" self.setTimeStepper(stepper)\n",
" \n",
" def slowTimeStep(self, dt=0.0):\n",
" \"\"\"Takes a time step using straight forward Python loops.\"\"\"\n",
" g = self.grid\n",
" nx, ny = g.u.shape\n",
" dx2, dy2 = g.dx**2, g.dy**2\n",
" dnr_inv = 0.5/(dx2 + dy2)\n",
" u = g.u\n",
" \n",
" err = 0.0\n",
" for i in range(1, nx-1):\n",
" for j in range(1, ny-1):\n",
" tmp = u[i,j]\n",
" u[i,j] = ((u[i-1, j] + u[i+1, j])*dy2 +\n",
" (u[i, j-1] + u[i, j+1])*dx2)*dnr_inv\n",
" diff = u[i,j] - tmp\n",
" err += diff*diff\n",
" \n",
" return numpy.sqrt(err)\n",
" \n",
" def setTimeStepper(self, stepper='numeric'):\n",
" \"\"\"Sets the time step scheme to be used while solving given a\n",
" string which should be one of ['slow', 'numeric', 'blitz',\n",
" 'inline', 'fastinline', 'fortran'].\"\"\"\n",
" if stepper == 'slow':\n",
" self.timeStep = self.slowTimeStep\n",
" # ...\n",
" else:\n",
" self.timeStep = self.numericTimeStep\n",
" \n",
" def solve(self, n_iter=0, eps=1.0e-16):\n",
" err = self.timeStep()\n",
" count = 1\n",
" \n",
" while err > eps:\n",
" if n_iter and count >= n_iter:\n",
" return err\n",
" err = self.timeStep()\n",
" count = count + 1\n",
" \n",
" return count\n",
" \n",
"\n",
"The code is pretty simple and very easy to write but if we run it for any\n",
"sizeable problem (say a 500 x 500 grid of points), we'll see that it takes\n",
"*forever* to run. The CPU hog in this case is the `slowTimeStep` method. In\n",
"the next section we will speed it up using NumPy.\n",
"\n",
"## Using NumPy\n",
"\n",
"It turns out that the innermost loop of the `LaplaceSolver.slowTimeStep`\n",
"method can be readily expressed by a much simpler NumPy expression. Here is a\n",
"re-written timeStep method.\n",
"\n",
" \n",
" \n",
" def numericTimeStep(self, dt=0.0):\n",
" \"\"\"Takes a time step using a NumPy expression.\"\"\"\n",
" g = self.grid\n",
" dx2, dy2 = g.dx**2, g.dy**2\n",
" dnr_inv = 0.5/(dx2 + dy2)\n",
" u = g.u\n",
" g.old_u = u.copy() # needed to compute the error.\n",
" \n",
" # The actual iteration\n",
" u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 +\n",
" (u[1:-1,0:-2] + u[1:-1, 2:])*dx2)*dnr_inv\n",
" \n",
" return g.computeError()\n",
"\n",
"The entire for i and j loops have been replaced by a single NumPy expression.\n",
"NumPy expressions operate elementwise and hence the above expression works. It\n",
"basically computes the four point average. If you have gone through the NumPy\n",
"tutorial and played with [NumPy](NumPy.html) a bit you should be able to\n",
"understand how this works. The beauty of the expression is that its completely\n",
"done in C. This makes the computation *much* faster. For a quick comparison\n",
"here are some numbers for a single iteration on a 500x500 grid. On a PIII\n",
"450Mhz with 192 MB RAM, the above takes about 0.3 seconds whereas the previous\n",
"one takes around 15 seconds. This is close to a 50 fold speed increase. You\n",
"will also note a few things.\n",
"\n",
" 1. We cannot compute the error the way we did earlier inside the for loop. We need to make a copy of the data and then use the computeError function to do this. This costs us memory and is not very pretty. This is certainly a limitation but is worth a 50 fold speed increase. \n",
" 2. The expression will use temporaries. Hence, during one iteration, the computed values at an already computed location will not be used during the iteration. For instance, in the original for loop, once the value of u[1,1] is computed, the next value for u[1,2] will use the newly computed u[1,1] and not the old one. However, since the NumPy expression uses temporaries internally, only the old value of u[1,1] will be used. This is not a serious issue in this case because it is known that even when this happens the algorithm will converge (but in twice as much time, which reduces the benefit by a factor of 2, which still leaves us with a 25 fold increase). \n",
"\n",
"Apart from these two issues its clear that using NumPy boosts speed\n",
"tremendously. We will now use the amazing weave package to speed this up\n",
"further.\n",
"\n",
"## Using weave.blitz\n",
"\n",
"The NumPy expression can be speeded up quite a bit if we use weave.blitz. Here\n",
"is the new function.\n",
"\n",
" \n",
" \n",
" # import necessary modules and functions\n",
" from scipy import weave\n",
" # ...\n",
" def blitzTimeStep(self, dt=0.0):\n",
" \"\"\"Takes a time step using a NumPy expression that has been\n",
" blitzed using weave.\"\"\"\n",
" g = self.grid\n",
" dx2, dy2 = g.dx**2, g.dy**2\n",
" dnr_inv = 0.5/(dx2 + dy2)\n",
" u = g.u\n",
" g.old_u = u.copy()\n",
" \n",
" # The actual iteration\n",
" expr = \"u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 + \"\\\n",
" \"(u[1:-1,0:-2] + u[1:-1, 2:])*dx2)*dnr_inv\"\n",
" weave.blitz(expr, check_size=0)\n",
" \n",
" return g.computeError()\n",
" \n",
"\n",
"If you notice, the only thing that has changed is that we put quotes around\n",
"the original numeric expression and call this string 'expr' and then invoke\n",
"weave.blitz. The 'check_size' keyword when set to 1 does a few sanity checks\n",
"and is to be used when you are debugging your code. However, for pure speed it\n",
"is wise to set it to 0. This time when we time the code for a 500x500 array\n",
"for a single iteration it takes only about 0.1 seconds which is about a three\n",
"fold increase! There are again a few things to note.\n",
"\n",
" 1. The first time this method is called, it will take a long while to do some magic behind your back. The next time it is called, it will run immediately. More details on this are in the weave documentation. Basically, weave.blitz converts the NumPy expression into C++ code and uses blitz++ for the array expression, builds a Python module, stores it in a special place and invokes that the next time the function call is made. \n",
" 2. Again we need to use a temporary array to compute the error. \n",
" 3. blitz does *not* use temporaries for the computation and therefore behaves more like the original (slow) for loop in that the computed values are re-used immediately. \n",
"\n",
"Apart from these points, the results are identical as compared to the original\n",
"for loop. It's only about 170 times faster than the original code! We will now\n",
"look at yet another way to speed up our original for loop. Enter weave.inline!\n",
"\n",
"## Using weave.inline\n",
"\n",
"Inline allows one to embed C or C++ code directly into your Python code. Here\n",
"is a simple version of an inlined version of the code.\n",
"\n",
" \n",
" \n",
" from scipy.weave import converters\n",
" # ...\n",
" def inlineTimeStep(self, dt=0.0):\n",
" \"\"\"Takes a time step using inlined C code -- this version uses\n",
" blitz arrays.\"\"\"\n",
" g = self.grid\n",
" nx, ny = g.u.shape\n",
" dx2, dy2 = g.dx**2, g.dy**2\n",
" dnr_inv = 0.5/(dx2 + dy2)\n",
" u = g.u\n",
" \n",
" code = \"\"\"\n",
" #line 120 \"laplace.py\" (This is only useful for debugging)\n",
" double tmp, err, diff;\n",
" err = 0.0;\n",
" for (int i=1; i \"d\":\n",
" raise TypeError(\"Double array required\")\n",
" if u.nd <> 2:\n",
" raise ValueError(\"2 dimensional array required\")\n",
" cdef int nx, ny\n",
" cdef double dx2, dy2, dnr_inv, err\n",
" cdef double *elem\n",
" \n",
" nx = u.dimensions[0]\n",
" ny = u.dimensions[1]\n",
" dx2, dy2 = dx**2, dy**2\n",
" dnr_inv = 0.5/(dx2 + dy2)\n",
" elem = u.data\n",
" \n",
" err = 0.0\n",
" cdef int i, j\n",
" cdef double *uc, *uu, *ud, *ul, *ur\n",
" cdef double diff, tmp\n",
" for i from 1 <= i < nx-1:\n",
" uc = elem + i*ny + 1\n",
" ur = elem + i*ny + 2\n",
" ul = elem + i*ny\n",
" uu = elem + (i+1)*ny + 1\n",
" ud = elem + (i-1)*ny + 1\n",
" \n",
" for j from 1 <= j < ny-1:\n",
" tmp = uc[0]\n",
" uc[0] = ((ul[0] + ur[0])*dy2 +\n",
" (uu[0] + ud[0])*dx2)*dnr_inv\n",
" diff = uc[0] - tmp\n",
" err = err + diff*diff\n",
" uc = uc + 1; ur = ur + 1; ul = ul + 1\n",
" uu = uu + 1; ud = ud + 1\n",
" \n",
" return sqrt(err)\n",
" \n",
"\n",
"The function looks long but is not too hard to write. It is also possible to\n",
"write without doing the pointer arithmetic by providing convenient functions\n",
"to access the array. However, the code shown above is fast. The sources\n",
"provided with this article contains the complete Pyrex file and also a\n",
"setup.py script to build it. Timing this version, we find that this version is\n",
"as fast as the fast inlined version and takes only 0.025 seconds.\n",
"\n",
"## Using Matlab and Octave\n",
"\n",
"We have implemented the Numeric version in Matlab and Octave ( [laplace.m](Per\n",
"formancePythonffae.html?action=AttachFile&do=view&target=laplace.m \"\" ) ) and\n",
"run the tests on a different computer (hence the \"estimate\" values in the\n",
"table below). We have found that no significant speed-up is obtained in\n",
"Matlab, while Octave runs twice slower than [NumPy](NumPy.html). Detailed\n",
"graphs can be found\n",
"[here](http://lbolla.wordpress.com/2007/04/11/numerical-\n",
"computing-matlab-vs-pythonnumpyweave/).\n",
"\n",
"## An implementation in C++\n",
"\n",
"Finally, for comparison we implemented this in simple C++ (nothing fancy)\n",
"without any Python. One would expect that the C++ code would be faster but\n",
"surprisingly, not by much! Given the fact that it's so easy to develop with\n",
"Python, this speed reduction is not very significant.\n",
"\n",
"## A final comparison\n",
"\n",
"Here are some timing results for a 500x500 grid for 100 iterations. Note that\n",
"we also have a comparison of results of using the slow Python version along\n",
"with Psyco.\n",
"\n",
"Type of solution |Time taken (sec) \n",
"---|--- \n",
"Python (estimate) |1500.0 \n",
"Python + Psyco (estimate) |1138.0 \n",
"Python + [NumPy](NumPy.html) Expression | 29.3 \n",
"Blitz | 9.5 \n",
"Inline | 4.3 \n",
"Fast Inline | 2.3 \n",
"Python/Fortran | 2.9 \n",
"Pyrex | 2.5 \n",
"Matlab (estimate) | 29.0 \n",
"Octave (estimate) | 60.0 \n",
"Pure C++ | 2.16 \n",
" \n",
"This is pretty amazing considering the flexibility and power of Python.\n",
"\n",
"Download the source code for this guide here: [perfpy_2.tgz](files/attachments/PerformancePython/perfpy_2.tgz)\n",
"\n",
"View the complete Python code for the example: [laplace.py](files/attachments/PerformancePython/laplace.py)\n",
"\n",
"View the complete Matlab/Octave code for the example: [laplace.m](files/attachments/PerformancePython/laplace.py)"
]
}
],
"metadata": {}
}
]
}