NumPy Internals: An Introduction

The world under the covers

9 min readNov 15, 2022

Introduction

This is a short article on how NumPy handles arrays internally. It is considered an advanced topic and deep understanding is not strictly required for many NumPy casual users. However, my belief is that knowing fundamental NumPy concepts is useful regardless if you are using NumPy directly or indirectly, e.g. through pandas. It is not only about satisfying personal curiosity, but more about fine tuning performance when arrays are filling up memory. In such cases creating yet another copy, even if transient, can be an issue. The article does not go very deep, but hopefully it manages to whet the appetite of curious readers. The NumPy documentation and Travis Oliphant’s book Guide to NumPy are infinitely more informative. For my needs, and perhaps the needs of the typical data analyst, this article can provide a foundation that is adequate for everyday tasks.

Arrays in NumPy have essentially two parts: the data buffer that contains the actual (raw) data, and the information about how the raw array data can be accessed and used that can be thought of as array metadata. In languages like C or Fortran the raw data is all there is: a contiguous (and fixed) block of memory containing fixed-sized data items. Much of the flexibility of NumPy is because of the array metadata. The data buffer can be interpreted in many different ways without being recreated. For small arrays this is not so important. For larger data volumes copying the array buffer, perhaps repeatedly, can become a bottleneck. As an example, when an NumPy array is sliced, the metadata are altered but the data buffer remains the same, i.e. the array base points to the original array from which the slice was taken. This article will enumerate a set of examples that helped me understand the underlying machinery and become more confident with using NumPy and pandas, without claiming that I have fully mastered all details. Doing so would require studying the source code and writing an article for a different target audience. My curiosity was triggered by looking at NumPy docstrings such the entry point in NumPy, i.e. the array creation

Figure 1: Specifying the memory layout during array creation in NumPy (image created by the author)

What is the order parameter supposed to do and why do we need it? Admittedly, it is rare that one needs to specify the order parameter, but at the same time I have seen NumPy complaining that the shape is incompatible for a non-contiguous array. My first question when I saw this was “how can it be that an array is non-contiguous?” followed shortly after by “how can I even test if an array is non-contiguous?”. I am not the only one having the same kind of worries. If you are also curious about the NumPy array internal organisation please keep reading!

How are NumPy arrays handled?

Before we are in position to experiment with different memory layouts, strides and array transformations let’s put together a utility function

The function uses f-strings in all of their glory to print succinctly a number of array attributes that will be helpful later on. The code also defines a 4x3 array of integers (int32) and uses the utility function to print the attributes

Let’s see what all these mean. The utility function creates a succinct string representation of the array using the np.array_str function, that was further modified to replace new lines with a comma, respecting the constraint that f-strings cannot contain backslashes, and hence \n, within brackets. The second piece of information is taken from the array flags object that provides information about the memory layout of the array. We only kept two of its attributes that show if the data buffer is in a single C-style (or row based) or Fortran-style (column based) contiguous segment. The next piece of information is the whole np.ndarray.__array_interface__ dictionary that contains up to eight items. We will not cover them all. Important keys for this article are:

shape a tuple with the array size in each dimension that can also be obtained with np.ndarray.shape
typestr a string with the basic type (e.g. i stands for integer) and byteorder of the data followed by an integer providing the number of bytes the type uses; with the exception of the byteorder the type can also be obtained with np.ndarray.dtype that is also returned by the utility function
the first entry of the data tuple is a pointer to the first element of the data buffer
strides is either None to indicate a C-style contiguous array or a tuple of strides which provides the number of bytes needed to jump to the next array element in the corresponding dimension; the utility function also provides np.ndarray.strides that is always populated.

The utility function also returns the base object if memory is from some other object, formatting it to fit on one line, as was done with the array itself. It also returns the memory layout of the base. With this information we achieve a basic introspection to understand what is happening under the hood.

Returning to the example, the array a passed to the utility function has shape (4, 3) and its dtype is int32, so each element consumes 4 bytes. Because the starting point was a Python list, the created NumPy array was in C-order as explained in the documentation. In order to move one element to the right in the same row (2nd dimension) we need to jump 4 bytes. To move one element down in the same column (1st dimension) we need to jump by 3x4 =12 bytes that is consistent with the (12, 4) stridestuple.

Let’s see what happens if we create the array using F-order memory layout

that prints

The strides tuple has changed. Now moving down by one element jumps 4 bytes because the array is stored in column-major order. On the other hand the array itself is the same, has the same shape and can be indexed in the same way. This explains why most casual users of NumPy may not look into the memory layout.

Does the memory layout have an effect on performance?

Consider the following summation over rows and columns for an array with 100 million float64 elements

that gives

Figure 2: Effect of memory layout in summation (image created by the author)

We can see that the summation over rows is twice as fast when the array is stored in row-major (C-order) compared to column-major (F-order). It is the exact opposite for a summation over columns. One could argue that this is a small difference and I would agree, although it shows that the memory layout is not irrelevant either.

Reshaping

A first situation to explore is array reshaping

that prints

The first thing to note is that the two reshaping operations do not produce the same result. This is because the order parameter of the np.ndarray.reshape has nothing to do with the memory layout of the underlying array, but defines how elements are read and placed into the reshaped array. ‘C’ reads and places the elements in row-major order, i.e. with the last axis index changing fastest. ‘F’ does the opposite. In both reshape operations the reshaped array is a view. However:

in the C-order reshape the base is the original array (b.base is a returns True). Taking into account the base and its C-order layout we can deduce that if we want to move one element down in the reshaped array we need to make a 6x4 = 24 bytes jump and moving one element to the right a 4 bytes jump. This is consistent with the strides information reported by the utility function.
in the F-order reshape the base is not the original array (b.base is a returns False, the pointer to the first element has changed and np.shares_momeory(a, b) returns False). In other words, the elements were relocated in memory. My understanding is that this technically constitutes a copy, although the base of the reshaped array is not None (please comment if your understanding is different!). Knowing the base and its F-order memory layout, it is once more easy to deduce the strides. To go from element 1 to 4 in the reshaped array we need to jump by 4 bytes and to go from element 1 to 7 by 2x4 = 8 bytes.

In the previous example the original array has C-order memory layout. If we were to start with an F-order initial array we would see that the C-order reshape leads to a copy and the F-order reshape leads to a view

that prints

This example is useful for demonstrating another way to tell when reshaping leads to a copy. According to the NumPy documentation setting the array shape directly leads to in-place C-order reshaping. Reshaping in place is bound to fail if a copy needs to be made as in the last C-order reshape

that gives

Traceback (most recent call last):
  File "D:\myLearningBooks\2022_10_30_PythonForDataAnalytics3rdEdition\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3378, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1405-1144ef469292>", line 3, in <module>
    a.shape = (2, -1)
AttributeError: Incompatible shape for in-place modification. Use `.reshape()` to make a copy with the desired shape.

The documentation discourages setting the shape tuple directly in favour of the np.ndarray.reshape function, stating that it may be deprecated. The documentation also states that if you want an error to be raised when the data is copied, you should assign the new shape to the shape attribute of the array. This is somehow inconsistent in my view, and I would expect the documentation to be streamlined in the future. The utility function in this article provides an alternative and hopefully comprehensive introspection, that includes information about when a copy is made.

Transposing

This may sound easier case than the reshape, but there are some things to discover. We will start with the same array as before, with the default C-order memory layout

that prints

The base of the transposed array is the original array that maintains its C-order memory layout. To move one element down in the transposed array, i.e. to move from element 1 to element 2, we need to jump by 4 bytes. To move one element to the right in the transposed array, i.e. to move from element 1 to element 4, we need to jump by 3x4 = 12 bytes. These are consistent with the strides obtained by the utility function. The memory layout of the transposed array is F-order. Essentially, this means that the transpose operation required little change in the internal representation of the array. It was sufficient to read the data buffer in column-major rather than row-major order. I find this quite clever and a good demonstration of the underlying NumPy engine. Nothing was copied, but the array was still transposed.

Slicing

The last example is about slicing

that prints

For all three slices the base remains the original, C-order array as also indicated by the pointer to the memory address of the first element. The strides also remain unchanged. When rows are selected the sliced array remains C-contiguous. However, once we select columns the sliced array is neither C-contiguous nor F-contiguous. Figure 3 shows the reasoning.

Figure 3: Non-contiguous array after slicing (image created by the author)

Although the sliced array is non-contiguous, there is no need to create a copy. We can still use the strides. To move one element down in the reshaped array we need to make a 3x4 = 12 bytes jump and moving one element to the right requires a 4 bytes jump. Pretty neat. NumPy offers the possibility to create a contiguous array from a non-contiguous one with thenp.ascontiguousarray() function

but as expected this necessitates the creation of a copy

Unless if one wants to carry out repeating row or column wide operations, I am not so sure why this would be useful given that it has its own memory and processing time cost, but NumPy offers the possibility.

Conclusions

NumPy is not new. Still, whenever I think about it I can only be impressed with its ingenuity. The addition of array metadata on top of the raw data allows for a very flexible use of arrays. By changing the metadata it is possible to change the shape, transpose or slice an array without rearranging the raw data. The data buffer is used again and again and by being interpreted differently it is possible to create new views with negligible computational cost. This is darn clever and this article does little justice to the brilliance of NumPy. I still hope though that it helps understanding concepts such as memory layout and strides. In addition to allowing the user using NumPy in a performant way, acquiring this knowledge is a first step towards using more esoteric NumPy functions, such np.stride_tricks.as_strided to create views with given shape and strides. This sounds intriguing but needs to be done with care as it is possible to point to invalid memory, crash the program and in more unfortunate situations corrupt the results. If you want to go this far you can refer to this article and its references.

TDS Archive

NumPy Internals: An Introduction

The world under the covers

Introduction

How are NumPy arrays handled?

Does the memory layout have an effect on performance?

Reshaping

Transposing

Slicing

Conclusions

Published in TDS Archive

Written by Pan Cretan