{
"metadata": {
"signature": "sha256:f71aa5c594c36f0b4aaf7097997e21d8fc4e922497e816c79bdc338e838e2c2b"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indexing numpy arrays\n",
"=====================\n",
"\n",
"The whole point of numpy is to introduce a multidimensional array object\n",
"for holding homogeneously-typed numerical data. This is of course a\n",
"useful tool for storing data, but it is also possible to manipulate\n",
"large numbers of values without writing inefficient python loops. To\n",
"accomplish this, one needs to be able to refer to elements of the arrays\n",
"in many different ways, from simple \"slices\" to using arrays as lookup\n",
"tables. The purpose of this page is to go over the various different\n",
"types of indexing available. Hopefully the sometimes-peculiar syntax\n",
"will also become more clear.\n",
"\n",
"\n",
"\n",
"We will use the same arrays as examples wherever possible:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"A = np.arange(10)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"B = np.reshape(np.arange(9),(3,3))\n",
"B"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"array([[0, 1, 2],\n",
" [3, 4, 5],\n",
" [6, 7, 8]])"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"C = np.reshape(np.arange(2*3*4),(2,3,4))\n",
"C"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"array([[[ 0, 1, 2, 3],\n",
" [ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]],\n",
"\n",
" [[12, 13, 14, 15],\n",
" [16, 17, 18, 19],\n",
" [20, 21, 22, 23]]])"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Elements\n",
"--------\n",
"\n",
"The simplest way to pick one or some elements of an array looks very\n",
"similar to python lists:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"1"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"B[1,0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"3"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"C[1,0,2]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"14"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That is, to pick out a particular element, you simply put the indices\n",
"into square brackets after it. As is standard for python, element\n",
"numbers start at zero.\n",
"\n",
"If you want to change an array value in-place, you can simply use the\n",
"syntax above in an assignment:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"T = A.copy()\n",
"T[3] = -5\n",
"T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"array([ 0, 1, 2, -5, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"T[0] += 7\n",
"T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"array([ 7, 1, 2, -5, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(The business with .copy() is to ensure that we don't actually modify A,\n",
"since that would make further examples confusing.) Note that numpy also\n",
"supports python's \"augmented assignment\" operators, +=, -=, \\*=, and so\n",
"on.\n",
"\n",
"Be aware that the type of array elements is a property of the array\n",
"itself, so that if you try to assign an element of another type to an\n",
"array, it will be silently converted (if possible):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"T = A.copy()\n",
"T[3] = -1.5\n",
"T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"array([ 0, 1, 2, -1, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"T[3] = -0.5j"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "can't convert complex to long",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mT\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m0.5j\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: can't convert complex to long"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"array([ 0, 1, 2, -1, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the conversion that happens is a default conversion; in the\n",
"case of float to int conversion, it's truncation. If you wanted\n",
"something different, say taking the floor, you would have to arrange\n",
"that yourself (for example with np.floor()). In the case of converting\n",
"complex values to integers, there's no resonable default way to do it,\n",
"so numpy raises an exception and leaves the array unchanged.\n",
"\n",
"Finally, two slightly more technical matters.\n",
"\n",
"If you want to manipulate indices programmatically, you should know that\n",
"when you write something like"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"C[1,0,1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"13"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"it is the same as (in fact it is internally converted to)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"C[(1,0,1)]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"13"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This peculiar-looking syntax is constructing a tuple, python's data\n",
"structure for immutable sequences, and using that tuple as an index into\n",
"the array. (Under the hood, C[1,0,1] is converted to\n",
"C.\\_\\_getitem\\_\\_((1,0,1)).) This means you can whip up tuples if you\n",
"want to:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"i = (1,0,1)\n",
"C[i]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
"13"
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If it doesn't seem likely you would ever want to do this, consider\n",
"iterating over an arbitrarily multidimensional array:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for i in np.ndindex(B.shape):\n",
" print i, B[i]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(0, 0) 0\n",
"(0, 1) 1\n",
"(0, 2) 2\n",
"(1, 0) 3\n",
"(1, 1) 4\n",
"(1, 2) 5\n",
"(2, 0) 6\n",
"(2, 1) 7\n",
"(2, 2) 8\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indexing with tuples will also become important when we start looking at\n",
"fancy indexing and the function np.where().\n",
"\n",
"The last technical issue I want to mention is that when you select an\n",
"element from an array, what you get back has the same type as the array\n",
"elements. This may sound obvious, and in a way it is, but keep in mind\n",
"that even innocuous numpy arrays like our A, B, and C often contain\n",
"types that are not quite the python types:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a = C[1,2,3]\n",
"a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
"23"
]
}
],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"type(a)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"numpy.int64"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"type(int(a))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"int"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a**a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"-c:1: RuntimeWarning: overflow encountered in long_scalars\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
"8450172506621111015"
]
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"int(a)**int(a)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"20880467999847912034355032910567L"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"numpy scalars also support certain indexing operations, for consistency,\n",
"but these are somewhat subtle and under discussion.\n",
"\n",
"Slices\n",
"------\n",
"\n",
"It is obviously essential to be able to work with single elements of an\n",
"array. But one of the selling points of numpy is the ability to do\n",
"operations \"array-wise\":"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"2*A"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 23,
"text": [
"array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])"
]
}
],
"prompt_number": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is handy, but one very often wants to work with only part of an\n",
"array. For example, suppose one wants to compute the array of\n",
"differences of A, that is, the array whose elements are A[1]-A[0],\n",
"A[2]-A[1], and so on. (In fact, the function np.diff does this, but\n",
"let's ignore that for expositional convenience.) numpy makes it possible\n",
"to do this using array-wise operations:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[1:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 24,
"text": [
"array([1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[:-1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8])"
]
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[1:] - A[:-1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
"array([1, 1, 1, 1, 1, 1, 1, 1, 1])"
]
}
],
"prompt_number": 26
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is done by making an array that is all but the first element of A,\n",
"an array that is all but the last element of A, and subtracting the\n",
"corresponding elements. The process of taking subarrays in this way is\n",
"called \"slicing\".\n",
"\n",
"### One-dimensional slices\n",
"\n",
"The general syntax for a slice is *array*[*start*:*stop*:*step*]. Any or\n",
"all of the values *start*, *stop*, and *step* may be left out (and if\n",
"*step* is left out the colon in front of it may also be left out):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[5:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
"array([5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[:5]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 28,
"text": [
"array([0, 1, 2, 3, 4])"
]
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[::2]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"array([0, 2, 4, 6, 8])"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[1::2]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
"text": [
"array([1, 3, 5, 7, 9])"
]
}
],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[1:8:2]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 31,
"text": [
"array([1, 3, 5, 7])"
]
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As usual for python, the *start* index is included and the *stop* index\n",
"is not included. Also as usual for python, negative numbers for *start*\n",
"or *stop* count backwards from the end of the array:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[-3:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
"array([7, 8, 9])"
]
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[:-3]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
"array([0, 1, 2, 3, 4, 5, 6])"
]
}
],
"prompt_number": 33
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If *stop* comes before *start* in the array, then an array of length\n",
"zero is returned:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[5:3]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 34,
"text": [
"array([], dtype=int64)"
]
}
],
"prompt_number": 34
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(The \"dtype=int32\" is present in the printed form because in an array\n",
"with no elements, one cannot tell what type the elements have from their\n",
"printed representation. It nevertheless makes sense to keep track of the\n",
"type that they would have if the array had any elements.)\n",
"\n",
"If you specify a slice that happens to have only one element, you get an\n",
"array in return that happens to have only one element:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[5:6]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 35,
"text": [
"array([5])"
]
}
],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[5]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 36,
"text": [
"5"
]
}
],
"prompt_number": 36
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This seems fairly obvious and reasonable, but when dealing with fancy\n",
"indexing and multidimensional arrays it can be surprising.\n",
"\n",
"If the number *step* is negative, the step through the array is\n",
"negative, that is, the new array contains (some of) the elements of the\n",
"original in reverse order:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[::-1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": [
"array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])"
]
}
],
"prompt_number": 37
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is extremely useful, but it can be confusing when *start* and\n",
"*stop* are given:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[5:3:-1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 38,
"text": [
"array([5, 4])"
]
}
],
"prompt_number": 38
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A[3:5:1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 39,
"text": [
"array([3, 4])"
]
}
],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The rule to remember is: whether *step* is positive or negative, *start*\n",
"is always included and *stop* never is.\n",
"\n",
"Just as one can retrieve elements of an array as a subarray rather than\n",
"one-by-one, one can modify them as a subarray rather than one-by-one:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy()\n",
">>> T\n",
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n",
">>> T[1::2]\n",
"array([1, 3, 5, 7, 9])\n",
">>> T[1::2] = -np.arange(5)\n",
">>> T[1::2]\n",
"array([ 0, -1, -2, -3, -4])\n",
">>> T\n",
"array([ 0, 0, 2, -1, 4, -2, 6, -3, 8, -4])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the array you are trying to assign is the wrong shape, an exception\n",
"is raised:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy()\n",
">>> T[1::2] = np.arange(6)\n",
"Traceback (most recent call last):\n",
" File \"\", line 1, in \n",
"ValueError: shape mismatch: objects cannot be broadcast to a single shape\n",
">>> T[:4] = np.array([[0,1],[1,0]])\n",
"Traceback (most recent call last):\n",
" File \"\", line 1, in \n",
"ValueError: shape mismatch: objects cannot be broadcast to a single shape"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you think the error message sounds confusing, I have to agree, but\n",
"there is a reason. In the first case, we tried to stuff six elements\n",
"into five slots, so numpy refused. In the second case, there were the\n",
"right number of elements - four - but we tried to stuff a two-by-two\n",
"array where there was supposed to be a one-dimensional array of length\n",
"four. While numpy could have coerced the two-by-two array into the right\n",
"shape, instead the designers chose to follow the python philosophy\n",
"\"explicit is better than implicit\" and leave any coercing up to the\n",
"user. Let's do that, though:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy()\n",
">>> T[:4] = np.array([[0,1],[1,0]]).ravel()\n",
">>> T\n",
"array([0, 1, 1, 0, 4, 5, 6, 7, 8, 9])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So in order for assignment to work, it is not simply enough to have the\n",
"right number of elements - they must be arranged in an array of the\n",
"right shape.\n",
"\n",
"There is another issue complicating the error message: numpy has some\n",
"extremely convenient rules for converting lower-dimensional arrays into\n",
"higher-dimensional arrays, and for implicitly repeating arrays along\n",
"axes. This process is called \"broadcasting\". We will see more of it\n",
"elsewhere, but here it is in its simplest possible form:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy()\n",
">>> T[1::2] = -1\n",
">>> T\n",
"array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We told numpy to take a scalar, -1, and put it into an array of length\n",
"five. Rather than signal an error, numpy's broadcasting rules tell it to\n",
"convert this scalar into an effective array of length five by repeating\n",
"the scalar five times. (It does not, of course, actually create a\n",
"temporary array of this size; in fact it uses a clever trick of telling\n",
"itself that the temporary array has its elements spaced zero bytes\n",
"apart.) This particular case of broadcasting gets used all the time:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy()\n",
">>> T[1::2] -= 1\n",
">>> T\n",
"array([0, 0, 2, 2, 4, 4, 6, 6, 8, 8])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assignment is sometimes a good reason to use the \"everything\" slice:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy()\n",
">>> T[:] = -1\n",
">>> T\n",
"array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1])\n",
">>> T = A.copy()\n",
">>> T = -1\n",
">>> T\n",
"-1"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What happened here? Well, in the first case we told numpy to assign -1\n",
"to all the elements of T, so that's what it did. In the second case, we\n",
"told python \"T = -1\". In python, variables are just names that can be\n",
"attached to objects in memory. This is in sharp contrast with languages\n",
"like C, where a variable is a named region of memory where data can be\n",
"stored. Assignment to a variable name - T in this case - simply changes\n",
"which object the name refers to, without altering the underlying object\n",
"in any way. (If the name was the only reference to the original object,\n",
"it becomes impossible for your program ever to find it again after the\n",
"reassignment, so python deletes the original object to free up some\n",
"memory.) In a language like C, assigning to a variable changes the value\n",
"stored in that memory region. If you really must think in terms of C,\n",
"you can think of all python variables as holding pointers to actual\n",
"objects; assignment to a python variable is just modification of the\n",
"pointer, and doesn't affect the object pointed to (unless garbage\n",
"collection deletes it). In any case, if you want to modify the\n",
"*contents* of an array, you can't do it by assigning to the name you\n",
"gave the array; you must use slice assignment or some other approach.\n",
"\n",
"Finally, a technical point: how can a program work with slices\n",
"programmatically? What if you want to, say, save a slice specification\n",
"to apply to many arrays later on? The answer is to use a slice object,\n",
"which is constructed using slice():"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> A[1::2]\n",
"array([1, 3, 5, 7, 9])\n",
">>> s = slice(1,None,2)\n",
">>> A[s]\n",
"array([1, 3, 5, 7, 9])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(Regrettably, you can't just write \"s = 1::2\". But within square\n",
"brackets, 1::2 is converted internally to slice(1,None,2).) You can\n",
"leave out arguments to slice() just like you can with the colon\n",
"notation, with one exception:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> A[slice(-3)]\n",
"array([0, 1, 2, 3, 4, 5, 6])\n",
">>> A[slice(None,3)]\n",
"array([0, 1, 2])\n",
">>> A[slice()]\n",
"Traceback (most recent call last):\n",
" File \"\", line 1, in \n",
"TypeError: slice expected at least 1 arguments, got 0\n",
">>> A[slice(None,None,None)]\n",
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multidimensional slices\n",
"\n",
"One-dimensional arrays are extremely useful, but often one has data that\n",
"is naturally multidimensional - image data might be an N by M array of\n",
"pixel values, or an N by M by 3 array of colour values, for example.\n",
"Just as it is useful to take slices of one-dimensional arrays, it is\n",
"useful to take slices of multidimensional arrays. This is fairly\n",
"straightforward:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> B\n",
"array([[0, 1, 2],\n",
" [3, 4, 5],\n",
" [6, 7, 8]])\n",
">>> B[:2,:]\n",
"array([[0, 1, 2],\n",
" [3, 4, 5]])\n",
">>> B[:,::-1]\n",
"array([[2, 1, 0],\n",
" [5, 4, 3],\n",
" [8, 7, 6]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Essentially one simply specifies a one-dimensional slice for each axis.\n",
"One can also supply a number for an axis rather than a slice:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> B[0,:]\n",
"array([0, 1, 2])\n",
">>> B[0,::-1]\n",
"array([2, 1, 0])\n",
">>> B[:,0]\n",
"array([0, 3, 6])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that when one supplies a number for (say) the first axis, the\n",
"result is no longer a two-dimensional array; it's now a one-dimensional\n",
"array. This makes sense:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> B[:,:]\n",
"array([[0, 1, 2],\n",
" [3, 4, 5],\n",
" [6, 7, 8]])\n",
">>> B[0,:]\n",
"array([0, 1, 2])\n",
">>> B[0,0]\n",
"0"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you supply no numbers, you get a two-dimensional array; if you supply\n",
"one number, the dimension drops by one, and you get a one-dimensional\n",
"array; and if you supply two numbers the dimension drops by two and you\n",
"get a scalar. (If you think you should get a zero-dimensional array, you\n",
"are opening a can of worms. The distinction, or lack thereof, between\n",
"scalars and zero-dimensional arrays is an issue under discussion and\n",
"development.)\n",
"\n",
"If you are used to working with matrices, you may want to preserve a\n",
"distinction between \"row vectors\" and \"column vectors\". numpy supports\n",
"only one kind of one-dimensional array, but you could represent row and\n",
"column vectors as *two*-dimensional arrays, one of whose dimensions\n",
"happens to be one. Unfortunately indexing of these objects then becomes\n",
"cumbersome.\n",
"\n",
"As with one-dimensional arrays, if you specify a slice that happens to\n",
"have only one element, you get an array one of whose axes has length 1 -\n",
"the axis doesn't \"disappear\" the way it would if you had provided an\n",
"actual number for that axis:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> B[:,0:1]\n",
"array([[0],\n",
" [3],\n",
" [6]])\n",
">>> B[:,0]\n",
"array([0, 3, 6])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"numpy also has a few shortcuts well-suited to dealing with arrays with\n",
"an indeterminate number of dimensions. If this seems like something\n",
"unreasonable, keep in mind that many of numpy's functions (for example\n",
"np.sort(), np.sum(), and np.transpose()) must work on arrays of\n",
"arbitrary dimension. It is of course possible to extract the number of\n",
"dimensions from an array and work with it explicitly, but one's code\n",
"tends to fill up with things like (slice(None,None,None),)\\*(C.ndim-1),\n",
"making it unpleasant to read. So numpy has some shortcuts which often\n",
"simplify things.\n",
"\n",
"First the Ellipsis object:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> A[...]\n",
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n",
">>> B[...]\n",
"array([[0, 1, 2],\n",
" [3, 4, 5],\n",
" [6, 7, 8]])\n",
">>> B[0,...]\n",
"array([0, 1, 2])\n",
">>> B[0,...,0]\n",
"array(0)\n",
">>> C[0,...,0]\n",
"array([0, 4, 8])\n",
">>> C[0,Ellipsis,0]\n",
"array([0, 4, 8])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ellipsis (three dots) indicates \"as many ':' as needed\". (Its name\n",
"for use in index-fiddling code is Ellipsis, and it's not\n",
"numpy-specific.) This makes it easy to manipulate only one dimension of\n",
"an array, letting numpy do array-wise operations over the \"unwanted\"\n",
"dimensions. You can only really have one ellipsis in any given indexing\n",
"expression, or else the expression would be ambiguous about how many ':'\n",
"should be put in each. (In fact, for some reason it is allowed to have\n",
"something like \"C[...,...]\"; this is not actually ambiguous.)\n",
"\n",
"In some circumstances, it is convenient to omit the ellipsis entirely:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> B[0]\n",
"array([0, 1, 2])\n",
">>> C[0]\n",
"array([[ 0, 1, 2, 3],\n",
" [ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]])\n",
">>> C[0,0]\n",
"array([0, 1, 2, 3])\n",
">>> B[0:2]\n",
"array([[0, 1, 2],\n",
" [3, 4, 5]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you don't supply enough indices to an array, an ellipsis is silently\n",
"appended. This means that in some sense you can view a two-dimensional\n",
"array as an array of one-dimensional arrays. In combination with numpy's\n",
"array-wise operations, this means that functions written for\n",
"one-dimensional arrays can often just work for two-dimensional arrays.\n",
"For example, recall the difference operation we wrote out in the section\n",
"on one-dimensional slices:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> A[1:] - A[:-1]\n",
"array([1, 1, 1, 1, 1, 1, 1, 1, 1])\n",
">>> B[1:] - B[:-1]\n",
"array([[3, 3, 3],\n",
" [3, 3, 3]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It works, unmodified, to take the differences along the first axis of a\n",
"two-dimensional array.\n",
"\n",
"Writing to multidimensional slices works just the way writing to\n",
"one-dimensional slices does:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> T = B.copy()\n",
">>> T[1,:] = -1\n",
">>> T\n",
"array([[ 0, 1, 2],\n",
" [-1, -1, -1],\n",
" [ 6, 7, 8]])\n",
">>> T[:,:2] = -2\n",
">>> T\n",
"array([[-2, -2, 2],\n",
" [-2, -2, -1],\n",
" [-2, -2, 8]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FIXME: np.newaxis and broadcasting rules.\n",
"\n",
"### Views versus copies\n",
"\n",
"FIXME: Zero-dimensional arrays, views of a single element.\n",
"\n",
"Fancy indexing\n",
"--------------\n",
"\n",
"Slices are very handy, and the fact that they can be created as views\n",
"makes them efficient. But some operations cannot really be done with\n",
"slices; for example, suppose one wanted to square all the negative\n",
"values in an array. Short of writing a loop in python, one wants to be\n",
"able to locate the negative values, extract them, square them, and put\n",
"the new values where the old ones were:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> T = A.copy() - 5\n",
">>> T[T<0] **= 2\n",
">>> T\n",
"array([25, 16, 9, 4, 1, 0, 1, 2, 3, 4])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or suppose one wants to use an array as a lookup table, that is, for an\n",
"array B, produce an array whose i,j th element is LUT[B[i,j]]: FIXME:\n",
"argsort is a better example"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> LUT = np.sin(A)\n",
">>> LUT\n",
"array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,\n",
" -0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])\n",
">>> LUT[B]\n",
"array([[ 0. , 0.84147098, 0.90929743],\n",
" [ 0.14112001, -0.7568025 , -0.95892427],\n",
" [-0.2794155 , 0.6569866 , 0.98935825]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this sort of thing numpy provides what is called \"fancy indexing\".\n",
"It is not nearly as quick and lightweight as slicing, but it allows one\n",
"to do some rather sophisticated things while letting numpy do all the\n",
"hard work in C.\n",
"\n",
"### Boolean indexing\n",
"\n",
"It frequently happens that one wants to select or modify only the\n",
"elements of an array satisfying some condition. numpy provides several\n",
"tools for working with this sort of situation. The first is boolean\n",
"arrays. Comparisons - equal to, less than, and so on - between numpy\n",
"arrays produce arrays of boolean values:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> A<5\n",
"array([ True, True, True, True, True, False, False, False, False, False], dtype=bool)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are normal arrays. The actual storage type is normally a single\n",
"byte per value, not bits packed into a byte, but boolean arrays offer\n",
"the same range of indexing and array-wise operations as other arrays.\n",
"Unfortunately, python's \"and\" and \"or\" cannot be overridden to do\n",
"array-wise operations, so you must use the bitwise operations \"&\", \"|\",\n",
"and \"\\^\" (for exclusive-or). Similarly python's chained inequalities\n",
"cannot be overridden. Also, regrettably, one cannot chage the precence\n",
"of the bitwise operators:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> c = A<5 & A>1\n",
"Traceback (most recent call last):\n",
" File \"\", line 1, in \n",
"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()\n",
">>> c = (A<5) & (A>1)\n",
">>> c\n",
"array([False, False, True, True, True, False, False, False, False, False], dtype=bool)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nevertheless, numpy's boolean arrays are extremely powerful.\n",
"\n",
"One can use boolean arrays to extract values from arrays:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!python numbers=disable\n",
">>> c = (A<5) & (A>1)\n",
">>> A[c]\n",
"array([2, 3, 4])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is necessarily a copy of the original array, rather than a\n",
"view, since it will not normally be the case the the elements of c that\n",
"are True select an evenly-strided memory layout. Nevertheless it is also\n",
"possible to use boolean arrays to write to specific elements:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> T = A.copy()\n",
">>> c = (A<5) & (A>1)\n",
">>> T[c] = -7\n",
">>> T\n",
"array([ 0, 1, -7, -7, -7, 5, 6, 7, 8, 9])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FIXME: mention where()\n",
"\n",
"#### Multidimensional boolean indexing\n",
"\n",
"Boolean indexing works for multidimensional arrays as well. In its\n",
"simplest (and most common) incarnation, you simply supply a single\n",
"boolean array as index, the same shape as the original array:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> C[C%5==0]\n",
"array([ 0, 5, 10, 15, 20])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You then get back a one-dimensional array of the elements for which the\n",
"condition is True. (Note that the array must be one-dimensional, since\n",
"the boolean values can be arranged arbitrarily around the array. If you\n",
"want to keep track of the arrangement of values in the original array,\n",
"look into using numpy's \"masked array\" tools.) You can also use boolean\n",
"indexing for assignment, just as you can for one-dimensional arrays.\n",
"\n",
"Two very useful operations on boolean arrays are np.any() and np.all():"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> np.any(B<5)\n",
"True\n",
">>> np.all(B<5)\n",
"False"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"They do just what they say on the tin, evaluate whether any entry in the\n",
"boolean matrix is True, or whether all elements in the boolean matrix\n",
"are True. But they can also be used to evaluate \"along an axis\", for\n",
"example, to produce a boolean array saying whether any element in a\n",
"given row is True:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> B<5\n",
"array([[ True, True, True],\n",
" [ True, True, False],\n",
" [False, False, False]], dtype=bool)\n",
">>> np.any(B<5, axis=1)\n",
"array([ True, True, False], dtype=bool)\n",
">>> np.all(B<5, axis=1)\n",
"array([ True, False, False], dtype=bool)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One can also use boolean indexing to pull out rows or columns meeting\n",
"some criterion:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> B[np.any(B<5, axis=1),:]\n",
"array([[0, 1, 2],\n",
" [3, 4, 5]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result here is two-dimensional because there is one dimension for\n",
"the results of the boolean indexing, and one dimension because each row\n",
"is one-dimensional.\n",
"\n",
"This works with higher-dimensional boolean arrays as well:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> c = np.any(C<5,axis=2)\n",
">>> c\n",
"array([[ True, True, False],\n",
" [False, False, False]], dtype=bool)\n",
">>> C[c,:]\n",
"array([[0, 1, 2, 3],\n",
" [4, 5, 6, 7]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here too the result is two-dimensional, though that is perhaps a little\n",
"more surprising. The boolean array is two-dimensional, but the part of\n",
"the return value corresponding to the boolean array must be\n",
"one-dimensional, since the True values may be distributed arbitrarily.\n",
"The subarray of C corresponding to each True or False value is\n",
"one-dimensional, so we get a return array of dimension two.\n",
"\n",
"Finally, if you want to apply boolean conditions to the rows and columns\n",
"simultaneously, beware:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> B[np.array([True, False, True]), np.array([False, True, True])]\n",
"array([1, 8])\n",
">>> B[np.array([True, False, True]),:][:,np.array([False, True, True])]\n",
"array([[1, 2],\n",
" [7, 8]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The obvious approach doesn't give the right answer. I don't know why\n",
"not, or why it produces the value that it does. You can get the right\n",
"answer by indexing twice, but that's clumsy and inefficient and doesn't\n",
"allow assignment.\n",
"\n",
"FIXME: works with too-small boolean arrays for some reason?\n",
"\n",
"### List-of-locations indexing\n",
"\n",
"It happens with some frequency that one wants to pull out values at a\n",
"particular location in an array. If one wants a single location, one can\n",
"just use simple indexing. But if there are many locations, you need\n",
"something a bit more clever. Fortunately numpy supports a mode of fancy\n",
"indexing that accomplishes this:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> primes = np.array([2,3,5,7,11,13,17,19,23])\n",
">>> idx = [3,4,1,2,2]\n",
">>> primes[idx]\n",
"array([ 7, 11, 3, 5, 5])\n",
">>> idx = np.array([3,4,1,2,2])\n",
">>> primes[idx]\n",
"array([ 7, 11, 3, 5, 5])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you index with an array that is not an array of booleans, or with a\n",
"list, numpy views it as an array of indices. The array can be any shape,\n",
"and the returned array has the same shape:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> primes = np.array([2,3,5,7,11,13,17,19,23,29,31])\n",
">>> primes[B]\n",
"array([[ 2, 3, 5],\n",
" [ 7, 11, 13],\n",
" [17, 19, 23]])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Effectively this uses the original array as a look-up table.\n",
"\n",
"You can also assign to arrays in this way:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> T = A.copy()\n",
">>> T[ [1,3,5,0] ] = -np.arange(4)\n",
">>> T\n",
"array([-3, 0, 2, -1, 4, -2, 6, 7, 8, 9])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** Augmented assignment - the operators like \"+=\" - works, but\n",
"it does not necessarily do what you would expect. In particular,\n",
"repeated indices do not result in the value getting added twice:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> T = A.copy()\n",
">>> T[ [0,1,2,3,3,3] ] += 10\n",
">>> T\n",
"array([10, 11, 12, 13, 4, 5, 6, 7, 8, 9])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is surprising, inconvenient, and unfortunate, but it is a direct\n",
"result of how python implements the \"+=\" operators. The most common case\n",
"for doing this is something histogram-like:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> bins = np.zeros(5,dtype=np.int32)\n",
">>> pos = [1,0,2,0,3]\n",
">>> wts = [1,2,1,1,4]\n",
">>> bins[pos]+=wts\n",
">>> bins\n",
"array([1, 1, 1, 4, 0])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unfortunately this gives the wrong answer. In older versions of numpy\n",
"there was no really satisfactory solution, but as of numpy 1.1, the\n",
"histogram function can do this:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> bins = np.zeros(5,dtype=np.int32)\n",
">>> pos = [1,0,2,0,3]\n",
">>> wts = [1,2,1,1,4]\n",
">>> np.histogram(pos,bins=5,range=(0,5),weights=wts,new=True)\n",
"(array([3, 1, 1, 4, 0]), array([ 0., 1., 2., 3., 4., 5.]))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FIXME: mention put() and take()\n",
"\n",
"#### Multidimensional list-of-locations indexing\n",
"\n",
"One can also, not too surprisingly, use list-of-locations indexing on\n",
"multidimensional arrays. The syntax is, however, a bit surprising. Let's\n",
"suppose we want the list [B[0,0],B[1,2],B[0,1]]. Then we write:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> B[ [0,1,0], [0,2,1] ]\n",
"array([0, 5, 1])\n",
">>> [B[0,0],B[1,2],B[0,1]]\n",
"[0, 5, 1]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This may seem weird - why not provide a list of tuples representing\n",
"coordinates? Well, the reason is basically that for large arrays, lists\n",
"and tuples are very inefficient, so numpy is designed to work with\n",
"arrays only, for indices as well as values. This means that something\n",
"like B[ [(0,0),(1,2),(0,1)] ] looks just like indexing B with a\n",
"two-dimensional array, which as we saw above just means that B should be\n",
"used as a look-up table yielding a two-dimensional array of results\n",
"(each of which is one-dimensional, as usual when we supply only one\n",
"index to a two-dimensional array).\n",
"\n",
"In summary, in list-of-locations indexing, you supply an array of values\n",
"for each coordinate, all the same shape, and numpy returns an array of\n",
"the same shape containing the values obtained by looking up each set of\n",
"coordinates in the original array. If the coordinate arrays are not the\n",
"same shape, numpy's broadcasting rules are applied to them to try to\n",
"make their shapes the same. If there are not as many arrays as the\n",
"original array has dimensions, the original array is regarded as\n",
"containing arrays, and the extra dimensions appear on the result array.\n",
"\n",
"Fortunately, most of the time when one wants to supply a list of\n",
"locations to a multidimensional array, one got the list from numpy in\n",
"the first place. A normal way to do this is something like:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> idx = np.nonzero(B%2)\n",
">>> idx\n",
"(array([0, 1, 1, 2]), array([1, 0, 2, 1]))\n",
">>> B[idx]\n",
"array([1, 3, 5, 7])\n",
">>> B[B%2 != 0]\n",
"array([1, 3, 5, 7])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here nonzero() takes an array and returns a list of locations (in the\n",
"correct format) where the array is nonzero. Of course, one can also\n",
"index directly into the array with a boolean array; this will be much\n",
"more efficient unless the number of nonzero locations is small and the\n",
"indexing is done many times. But sometimes it is valuable to work with\n",
"the list of indices directly.\n",
"\n",
"#### Picking out rows and columns\n",
"\n",
"One unfortunate consequence of numpy's list-of-locations indexing syntax\n",
"is that users used to other array languages expect it to pick out rows\n",
"and columns. After all, it's quite reasonable to want to pull out a list\n",
"of rows and columns from a matrix. So numpy provides a convenience\n",
"function, ix\\_() for doing this:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> B[ np.ix_([0,2],[0,2]) ]\n",
"array([[0, 2],\n",
" [6, 8]])\n",
">>> np.ix_([0,2],[0,2])\n",
"(array([[0],\n",
" [2]]), array([[0, 2]]))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The way it works is by taking advantage of numpy's broadcasting\n",
"facilities. You can see that the two arrays used as row and column\n",
"indices have different shapes; numpy's broadcasting repeats each along\n",
"the too-short axis so that they conform.\n",
"\n",
"Mixed indexing modes\n",
"--------------------\n",
"\n",
"What happens when you try to mix slice indexing, element indexing,\n",
"boolean indexing, and list-of-locations indexing?\n",
"\n",
"How indexing works under the hood\n",
"---------------------------------\n",
"\n",
"A numpy array is a block of memory, a data type for interpreting memory\n",
"locations, a list of sizes, and a list of strides. So for example,\n",
"C[i,j,k] is the element starting at position\n",
"i\\*strides[0]+j\\*strides[1]+k\\*strides[2]. This means, for example, that\n",
"transposing amatrix can be done very efficiently: just reverse the\n",
"strides and sizes arrays. This is why slices are efficient and can\n",
"return views, but fancy indexing is slower and can't.\n",
"\n",
"At a python level, numpy's indexing works by overriding the\n",
"\\_\\_getitem\\_\\_ and \\_\\_setitem\\_\\_ methods in an ndarray object. These\n",
"methods are called when arrays are indexed, and they allow arbitrary\n",
"implementations:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
">>> class IndexDemo:\n",
"... def __getitem__(self, *args):\n",
"... print \"__getitem__\", args\n",
"... return 1\n",
"... def __setitem__(self, *args):\n",
"... print \"__setitem__\", args\n",
"... def __iadd__(self, *args):\n",
"... print \"__iadd__\", args\n",
"... \n",
">>> \n",
">>> T = IndexDemo()\n",
">>> T[1]\n",
"__getitem__ (1,)\n",
"1\n",
">>> T[\"fish\"]\n",
"__getitem__ ('fish',)\n",
"1\n",
">>> T[A]\n",
"__getitem__ (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),)\n",
"1\n",
">>> T[1,2]\n",
"__getitem__ ((1, 2),)\n",
"1\n",
">>> T[1] = 7\n",
"__setitem__ (1, 7)\n",
">>> T[1] += 7\n",
"__getitem__ (1,)\n",
"__setitem__ (1, 8)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Array-like objects\n",
"numpy and scipy provide a few other types that behave like arrays, in particular matrices and sparse matrices. Their indexing can differ from that of arrays in surprising ways.\n"
]
}
],
"metadata": {}
}
]
}