Differences between revisions 29 and 30

Deletions are marked like this. | Additions are marked like this. |

Line 491: | Line 491: |

---- | |

Line 495: | Line 495: |

---- | |

Line 499: | Line 499: |

---- * In vecsq(), line 86, since you will be creating a 1-dimensional vector: {{{int i,j,n,m, dims[2]; }}} should be: {{{int i,j,n,m, dims[1]; }}} * In {{{pyvector_to_Carrayptrs()}}}, {{{n}}} is never used. -- FredSpiessens |

C Extensions for Using NumPy Arrays

I've written several C extensions that handle NumPy arrays. They are simple, but they seem to work well. They will show you how to pass Python variables and NumPy arrays to your C code. Once you learn how to do it, it's pretty straight-forward. I suspect they will suffice for most numerical code. I've written it up as a draft and have made the code and document file available. I found for my numerical needs I really only need to pass a limited set of things (integers, floats, strings, and NumPy arrays). If that's your category, this code might help you.

I have tested the routines and so far,so good, but I cannot guarantee anything. I am a bit new to this. If you find any errors put up a message on the SciPy mailing list.

A link to the tar ball that holds the code and docs is given below.

I have recently updated some information and included more examples. The document presented below is the original documentation which is still useful. The link below holds the latest documentation and source code.

-- Lou Pecora

*What follows is the content of Lou`s word-document originally pasted here as version 1. I (DavidLinke) have converted this to wiki-markup:*

# C Extensions to NumPy and Python

By Lou Pecora - 2006-12-07 (Draft version 0.1)

## Overview

### Introduction– a little background

In my use of Python I came across a typical problem: I needed to speed up particular parts of my code. I am not a Python guru or any kind of coding/computer guru. I use Python for numerical calculations and I make heavy use of Numeric/NumPy. Almost every Python book or tutorial tells you build C extensions to Python when you need a routine to run fast. C extensions are C code that can be compiled and linked to a shared library that can be imported like any Python module and you can call specified C routines like they were Python functions.

Sounds nice, but I had reservations. It looked non-trivial (it is, to an extent). So I searched for other solutions. I found them. They are such approaches as SWIG, Pyrex, ctypes, Psycho, and Weave. I often got the simple examples given to work (not all, however) when I tried these. But I hit a barrier when I tried to apply them to NumPy. Then one gets into typemaps or other hybrid constructs. I am not knocking these approaches, but I could never figure them out and get going on my own code despite lots of online tutorials and helpful suggestions from various Python support groups and emailing lists.

So I decided to see if I could just write my own C extensions. I got help in the form of some simple C extension examples for using Numeric written about 2000 from Tom Loredo of Cornell university. These sat on my hard drive until 5 years later out of desperation I pulled them out and using his examples, I was able to quickly put together several C extensions that (at least for me) handle all of the cases (so far) where I want a speedup. These cases mostly involve passing Python integers, floats (=C doubles), strings, and NumPy 1D and 2D float and integer arrays. I rarely need to pass anything else to a C routine to do a calculation. If you are in the same situation as me, then this package I put together might help you. It turns out to be fairly easy once you get going.

Please note, Tom Loredo is not responsible for any errors in my code or instructions although I am deeply indebted to him. Likewise, this code is for research only. It was tested by only my development and usage. It is not guaranteed, and comes with no warranty. Do not use this code where there are any threats of loss of life, limb, property, or money or anything you or others hold dear.

I developed these C extensions and their Python wrappers on a Macintosh G4 laptop using system OS X 10.4 (essential BSD Unix), Python 2.4, NumPy 0.9x, and the gnu compiler and linker gcc. I think most of what I tell you here will be easily translated to Linux and other Unix systems beyond the Mac. I am not sure about Windows. I hope that my low-level approach will make it easy for Windows users, too.

The code (both C and Python) for the extensions may look like a lot, but it is **very** repetitious. Once you get the main scheme for one extension function you will see that repeated over and over again in all the others with minor variations to handle different arguments or return different objects to the calling routine. Don't be put off by the code. The good news is that for many numerical uses extensions will follow the same format so you can quickly reuse what you already have written for new projects. Focus on one extension function and follow it in detail (in fact, I will do this below). Once you understand it, the other routines will be almost obvious. The same is true of the several utility functions that come with the package. They help you create, test, and manipulate the data and they also have a lot of repetition. The utility functions are also very short and simple so nothing to fear there.

### General Scheme for NumPy Extensions

This will be covered in detail below, but first I wanted to give you a sense of how each extension is organized.

Three things that must be done before your C extension functions in the C source file.

You must include Python and NumPy headers.

- Each extension must be named in a defining structure at the beginning of the file. This is a name used to access the extension from a Python function.
Next an initialization set of calls is made to set up the Python and NumPy calls and interface. It will be the same for all extensions involving NumPy and Python unless you add extensions to access other Python packages or classes beyond NumPy arrays. I will not cover any of that here (because I don't know it). So the init calls can be copied to each extension file.

Each C extension will have the following form.

The arguments will always be the same: (

`PyObject *self`,`PyObject *args`) - Don't worry if you don't know what exactly these are. They are pointers to general Python objects and they are automatically provided by the header files you will use from NumPy and Python itself. You need know no more than that.- The args get processed by a function call that parses them and assigns them to C defined objects.
- Next the results of that parse might be checked by a utility routine that reaches into the structure representing the object and makes sure the data is the right kind (float or int, 1D or 2D array, etc.). Although I included some of these C-level checks, you will see that I think they are better done in Python functions that are used to wrap the C extensions. They are also a lot easier to do in Python. I have plenty of data checks in my calling Python wrappers. Usually this does not lead to much overhead since you are not calling these extensions billions of times in some loop, but using them as a portal to a C or C++ routine to do a long, complex, repetitive, specialized calculation.
After some possible data checks, C data types are initialized to point to the data part of the NumPy arrays with the help of utility functions.

- Next dimension information is extracted so you know the number of columns, rows, vector dimensions, etc.
Now you can use the C arrays to manipulate the data in the NumPy arrays. The C arrays and C data from the above parse point to the original Python/NumPy data so changes you make affect the array values when you go back to Python after the extension returns. You can pass the arrays to other C functions that do calculations, etc. Just remember you are operating on the original NumPy matrices and vectors.

After your calculation you have to free any memory allocated in the construction of your C data for the NumPy arrays. This is done again by Utility functions. This step is only necessary if you allocated memory to handle the arrays (e.g. in the matrix routines), but is not necessary if you have not allocated memory (e.g. in the vector routines).

Finally, you return to the Python calling function, by returning a Python value or NumPy array. I have C extensions which show examples of both.

### Python Wrapper Functions

It is best to call the C extensions by calling a Python function that then calls the extension. This is called a Python wrapper function. It puts a more pythonic look to your code (e.g. you can use keywords easily) and, as I pointed out above, allows you to easily check that the function arguments and data are correct before you had them over to the C extension and other C functions for that big calculation. It may seem like an unnecessary extra step, but it's worth it.

## The Code

In this section I refer to the code in the source files C_arraytest.h, C_arraytest.c, C_arraytest.py, and C_arraytest.mak. You should keep those files handy (probably printed out) so you can follow the explanations of the code below.

### The C Code – one detailed example with utilities

First, I will use the example of code from C_arraytest.h, C_arraytest.c for the routine called matsq. This function takes a (NumPy) matrix *A*, integer *i*, and (Python) float *y* as input and outputs a return (NumPy) matrix *B* each of whose components is equal to the square of the input matrix component times the integer times the float. Mathematically:

*B*=_{ij}*i y*(*A*)_{ij}^{2}

The Python code to call the matsq routine is `A=matsq(B,i,y)`. Here is the relevant code in one place:

The Header file, C_arraytest.h:

1 /* Header to test of C modules for arrays for Python: C_test.c */ 2 3 /* ==== Prototypes =================================== */ 4 5 // .... Python callable Vector functions .................. 6 static PyObject *vecfcn1(PyObject *self, PyObject *args); 7 static PyObject *vecsq(PyObject *self, PyObject *args); 8 9 /* .... C vector utility functions ..................*/ 10 PyArrayObject *pyvector(PyObject *objin); 11 double *pyvector_to_Carrayptrs(PyArrayObject *arrayin); 12 int not_doublevector(PyArrayObject *vec); 13 14 15 /* .... Python callable Matrix functions ..................*/ 16 static PyObject *rowx2(PyObject *self, PyObject *args); 17 static PyObject *rowx2_v2(PyObject *self, PyObject *args); 18 static PyObject *matsq(PyObject *self, PyObject *args); 19 static PyObject *contigmat(PyObject *self, PyObject *args); 20 21 /* .... C matrix utility functions ..................*/ 22 PyArrayObject *pymatrix(PyObject *objin); 23 double **pymatrix_to_Carrayptrs(PyArrayObject *arrayin); 24 double **ptrvector(long n); 25 void free_Carrayptrs(double **v); 26 int not_doublematrix(PyArrayObject *mat); 27 28 /* .... Python callable integer 2D array functions ..................*/ 29 static PyObject *intfcn1(PyObject *self, PyObject *args); 30 31 /* .... C 2D int array utility functions ..................*/ 32 PyArrayObject *pyint2Darray(PyObject *objin); 33 int **pyint2Darray_to_Carrayptrs(PyArrayObject *arrayin); 34 int **ptrintvector(long n); 35 void free_Cint2Darrayptrs(int **v); 36 int not_int2Darray(PyArrayObject *mat); 37

The Source file, C_arraytest.c:

1 /* A file to test imorting C modules for handling arrays to Python */ 2 3 #include "Python.h" 4 #include "arrayobject.h" 5 #include "C_arraytest.h" 6 #include <math.h> 7 8 /* #### Globals #################################### */ 9 10 /* ==== Set up the methods table ====================== */ 11 static PyMethodDef _C_arraytestMethods[] = { 12 {"vecfcn1", vecfcn1, METH_VARARGS}, 13 {"vecsq", vecsq, METH_VARARGS}, 14 {"rowx2", rowx2, METH_VARARGS}, 15 {"rowx2_v2", rowx2_v2, METH_VARARGS}, 16 {"matsq", matsq, METH_VARARGS}, 17 {"contigmat", contigmat, METH_VARARGS}, 18 {"intfcn1", intfcn1, METH_VARARGS}, 19 {NULL, NULL} /* Sentinel - marks the end of this structure */ 20 }; 21 22 /* ==== Initialize the C_test functions ====================== */ 23 // Module name must be _C_arraytest in compile and linked 24 void init_C_arraytest() { 25 (void) Py_InitModule("_C_arraytest", _C_arraytestMethods); 26 import_array(); // Must be present for NumPy. Called first after above line. 27 } 28 29 /* #### Vector Extensions ############################## */ 30 31 /* ==== vector function - manipulate vector in place ====================== 32 Multiply the input by 2 x dfac and put in output 33 Interface: vecfcn1(vec1, vec2, str1, d1) 34 vec1, vec2 are NumPy vectors, 35 str1 is Python string, d1 is Python float (double) 36 Returns integer 1 if successful */ 37 static PyObject *vecfcn1(PyObject *self, PyObject *args) 38 { 39 PyArrayObject *vecin, *vecout; // The python objects to be extracted from the args 40 double *cin, *cout; // The C vectors to be created to point to the 41 // python vectors, cin and cout point to the row 42 // of vecin and vecout, respectively 43 int i,j,n; 44 const char *str; 45 double dfac; 46 47 /* Parse tuples separately since args will differ between C fcns */ 48 if (!PyArg_ParseTuple(args, "O!O!sd", &PyArray_Type, &vecin, 49 &PyArray_Type, &vecout, &str, &dfac)) return NULL; 50 if (NULL == vecin) return NULL; 51 if (NULL == vecout) return NULL; 52 53 // Print out input string 54 printf("Input string: %s\n", str); 55 56 /* Check that objects are 'double' type and vectors 57 Not needed if python wrapper function checks before call to this routine */ 58 if (not_doublevector(vecin)) return NULL; 59 if (not_doublevector(vecout)) return NULL; 60 61 /* Change contiguous arrays into C * arrays */ 62 cin=pyvector_to_Carrayptrs(vecin); 63 cout=pyvector_to_Carrayptrs(vecout); 64 65 /* Get vector dimension. */ 66 n=vecin->dimensions[0]; 67 68 /* Operate on the vectors */ 69 for ( i=0; i<n; i++) { 70 cout[i]=2.0*dfac*cin[i]; 71 } 72 73 return Py_BuildValue("i", 1); 74 } 75 76 /* ==== Square vector components & multiply by a float ========================= 77 Returns a NEW NumPy vector array 78 interface: vecsq(vec1, x1) 79 vec1 is NumPy vector, x1 is Python float (double) 80 returns a NumPy vector */ 81 static PyObject *vecsq(PyObject *self, PyObject *args) { 82 PyArrayObject *vecin, *vecout; 83 double *cin, *cout, dfactor; // The C vectors to be created to point to the 84 // python vectors, cin and cout point to the row 85 // of vecin and vecout, respectively 86 int i,j,n,m, dims[2]; 87 88 /* Parse tuples separately since args will differ between C fcns */ 89 if (!PyArg_ParseTuple(args, "O!d", 90 &PyArray_Type, &vecin, &dfactor)) return NULL; 91 if (NULL == vecin) return NULL; 92 93 /* Check that object input is 'double' type and a vector 94 Not needed if python wrapper function checks before call to this routine */ 95 if (not_doublevector(vecin)) return NULL; 96 97 /* Get the dimension of the input */ 98 n=dims[0]=vecin->dimensions[0]; 99 100 /* Make a new double vector of same dimension */ 101 vecout=(PyArrayObject *) PyArray_FromDims(1,dims,NPY_DOUBLE); 102 103 /* Change contiguous arrays into C *arrays */ 104 cin=pyvector_to_Carrayptrs(vecin); 105 cout=pyvector_to_Carrayptrs(vecout); 106 107 /* Do the calculation. */ 108 for ( i=0; i<n; i++) { 109 cout[i]= dfactor*cin[i]*cin[i]; 110 } 111 112 return PyArray_Return(vecout); 113 } 114 115 /* #### Vector Utility functions ######################### */ 116 117 /* ==== Make a Python Array Obj. from a PyObject, ================ 118 generates a double vector w/ contiguous memory which may be a new allocation if 119 the original was not a double type or contiguous 120 !! Must DECREF the object returned from this routine unless it is returned to the 121 caller of this routines caller using return PyArray_Return(obj) or 122 PyArray_BuildValue with the "N" construct !!! 123 */ 124 PyArrayObject *pyvector(PyObject *objin) { 125 return (PyArrayObject *) PyArray_ContiguousFromObject(objin, 126 NPY_DOUBLE, 1,1); 127 } 128 /* ==== Create 1D Carray from PyArray ====================== 129 Assumes PyArray is contiguous in memory. */ 130 double *pyvector_to_Carrayptrs(PyArrayObject *arrayin) { 131 int i,n; 132 133 n=arrayin->dimensions[0]; 134 return (double *) arrayin->data; /* pointer to arrayin data as double */ 135 } 136 /* ==== Check that PyArrayObject is a double (Float) type and a vector ============== 137 return 1 if an error and raise exception */ 138 int not_doublevector(PyArrayObject *vec) { 139 if (vec->descr->type_num != NPY_DOUBLE || vec->nd != 1) { 140 PyErr_SetString(PyExc_ValueError, 141 "In not_doublevector: array must be of type Float and 1 dimensional (n)."); 142 return 1; } 143 return 0; 144 } 145 146 /* #### Matrix Extensions ############################## */ 147 148 /* ==== Row x 2 function - manipulate matrix in place ====================== 149 Multiply the 2nd row of the input by 2 and put in output 150 interface: rowx2(mat1, mat2) 151 mat1 and mat2 are NumPy matrices 152 Returns integer 1 if successful */ 153 static PyObject *rowx2(PyObject *self, PyObject *args) 154 { 155 PyArrayObject *matin, *matout; // The python objects to be extracted from the args 156 double **cin, **cout; // The C matrices to be created to point to the 157 // python matrices, cin and cout point to the rows 158 // of matin and matout, respectively 159 int i,j,n,m; 160 161 /* Parse tuples separately since args will differ between C fcns */ 162 if (!PyArg_ParseTuple(args, "O!O!", &PyArray_Type, &matin, 163 &PyArray_Type, &matout)) return NULL; 164 if (NULL == matin) return NULL; 165 if (NULL == matout) return NULL; 166 167 /* Check that objects are 'double' type and matrices 168 Not needed if python wrapper function checks before call to this routine */ 169 if (not_doublematrix(matin)) return NULL; 170 if (not_doublematrix(matout)) return NULL; 171 172 /* Change contiguous arrays into C ** arrays (Memory is Allocated!) */ 173 cin=pymatrix_to_Carrayptrs(matin); 174 cout=pymatrix_to_Carrayptrs(matout); 175 176 /* Get matrix dimensions. */ 177 n=matin->dimensions[0]; 178 m=matin->dimensions[1]; 179 180 /* Operate on the matrices */ 181 for ( i=0; i<n; i++) { 182 for ( j=0; j<m; j++) { 183 if (i==1) cout[i][j]=2.0*cin[i][j]; 184 } } 185 186 /* Free memory, close file and return */ 187 free_Carrayptrs(cin); 188 free_Carrayptrs(cout); 189 return Py_BuildValue("i", 1); 190 } 191 /* ==== Row x 2 function- Version 2. - manipulate matrix in place ====================== 192 Multiply the 2nd row of the input by 2 and put in output 193 interface: rowx2(mat1, mat2) 194 mat1 and mat2 are NumPy matrices 195 Returns integer 1 if successful 196 Uses the utility function pymatrix to make NumPy C objects from PyObjects 197 */ 198 static PyObject *rowx2_v2(PyObject *self, PyObject *args) 199 { 200 PyObject *Pymatin, *Pymatout; // The python objects to be extracted from the args 201 PyArrayObject *matin, *matout; // The python array objects to be extracted from python objects 202 double **cin, **cout; // The C matrices to be created to point to the 203 // python matrices, cin and cout point to the rows 204 // of matin and matout, respectively 205 int i,j,n,m; 206 207 /* Parse tuples separately since args will differ between C fcns */ 208 if (!PyArg_ParseTuple(args, "OO", &Pymatin, &Pymatout)) return NULL; 209 if (NULL == Pymatin) return NULL; 210 if (NULL == Pymatout) return NULL; 211 212 /* Convert Python Objects to Python Array Objects */ 213 matin= pymatrix(Pymatin); 214 matout= pymatrix(Pymatout); 215 216 /* Check that objects are 'double' type and matrices 217 Not needed if python wrapper function checks before call to this routine */ 218 if (not_doublematrix(matin)) return NULL; 219 if (not_doublematrix(matout)) return NULL; 220 221 /* Change contiguous arrays into C ** arrays (Memory is Allocated!) */ 222 cin=pymatrix_to_Carrayptrs(matin); 223 cout=pymatrix_to_Carrayptrs(matout); 224 225 /* Get matrix dimensions. */ 226 n=matin->dimensions[0]; 227 m=matin->dimensions[1]; 228 229 /* Operate on the matrices */ 230 for ( i=0; i<n; i++) { 231 for ( j=0; j<m; j++) { 232 if (i==1) cout[i][j]=2.0*cin[i][j]; 233 } } 234 235 /* Free memory, close file and return */ 236 free_Carrayptrs(cin); 237 free_Carrayptrs(cout); 238 return Py_BuildValue("i", 1); 239 } 240 /* ==== Square matrix components function & multiply by int and float ========= 241 Returns a NEW NumPy array 242 interface: matsq(mat1, i1, d1) 243 mat1 is NumPy matrix, i1 is Python integer, d1 is Python float (double) 244 returns a NumPy matrix */ 245 static PyObject *matsq(PyObject *self, PyObject *args) 246 { 247 PyArrayObject *matin, *matout; 248 double **cin, **cout, dfactor; 249 int i,j,n,m, dims[2], ifactor; 250 251 /* Parse tuples separately since args will differ between C fcns */ 252 if (!PyArg_ParseTuple(args, "O!id", 253 &PyArray_Type, &matin, &ifactor, &dfactor)) return NULL; 254 if (NULL == matin) return NULL; 255 256 /* Check that object input is 'double' type and a matrix 257 Not needed if python wrapper function checks before call to this routine */ 258 if (not_doublematrix(matin)) return NULL; 259 260 /* Get the dimensions of the input */ 261 n=dims[0]=matin->dimensions[0]; 262 m=dims[1]=matin->dimensions[1]; 263 264 /* Make a new double matrix of same dims */ 265 matout=(PyArrayObject *) PyArray_FromDims(2,dims,NPY_DOUBLE); 266 267 /* Change contiguous arrays into C ** arrays (Memory is Allocated!) */ 268 cin=pymatrix_to_Carrayptrs(matin); 269 cout=pymatrix_to_Carrayptrs(matout); 270 271 /* Do the calculation. */ 272 for ( i=0; i<n; i++) { 273 for ( j=0; j<m; j++) { 274 cout[i][j]= ifactor*dfactor*cin[i][j]*cin[i][j]; 275 } } 276 277 /* Free memory, close file and return */ 278 free_Carrayptrs(cin); 279 free_Carrayptrs(cout); 280 return PyArray_Return(matout); 281 } 282 283 /* ==== Operate on Matrix components as contiguous memory ========================= 284 Shows how to access the array data as a contiguous block of memory. Used, for example, 285 in matrix classes implemented as contiquous memory rather than as n arrays of 286 pointers to the data "rows" 287 288 Returns a NEW NumPy array 289 interface: contigmat(mat1, x1) 290 mat1 is NumPy matrix, x1 is Python float (double) 291 returns a NumPy matrix */ 292 static PyObject *contigmat(PyObject *self, PyObject *args) 293 { 294 PyArrayObject *matin, *matout; 295 double *cin, *cout, x1; // Pointers to the contiguous data in the matrices to 296 // be used by C (e.g. passed to a program that uses 297 // matrix classes implemented as contiquous memory rather 298 // than as n arrays of pointers to the data "rows" 299 int i,j,n,m, dims[2], ncomps; // ncomps=n*m=total number of matrix components in mat1 300 301 /* Parse tuples separately since args will differ between C fcns */ 302 if (!PyArg_ParseTuple(args, "O!d", 303 &PyArray_Type, &matin, &x1)) return NULL; 304 if (NULL == matin) return NULL; 305 306 /* Check that object input is 'double' type and a matrix 307 Not needed if python wrapper function checks before call to this routine */ 308 if (not_doublematrix(matin)) return NULL; 309 310 /* Get the dimensions of the input */ 311 n=dims[0]=matin->dimensions[0]; 312 m=dims[1]=matin->dimensions[1]; 313 ncomps=n*m; 314 315 /* Make a new double matrix of same dims */ 316 matout=(PyArrayObject *) PyArray_FromDims(2,dims,NPY_DOUBLE); 317 318 /* Change contiguous arrays into C * arrays pointers to PyArrayObject data */ 319 cin=pyvector_to_Carrayptrs(matin); 320 cout=pyvector_to_Carrayptrs(matout); 321 322 /* Do the calculation. */ 323 printf("In contigmat, cout (as contiguous memory) =\n"); 324 for ( i=0; i<ncomps; i++) { 325 cout[i]= cin[i]-x1; 326 printf("%e ",cout[i]); 327 } 328 printf("\n"); 329 330 return PyArray_Return(matout); 331 } 332 333 /* #### Matrix Utility functions ######################### */ 334 335 /* ==== Make a Python Array Obj. from a PyObject, ================ 336 generates a double matrix w/ contiguous memory which may be a new allocation if 337 the original was not a double type or contiguous 338 !! Must DECREF the object returned from this routine unless it is returned to the 339 caller of this routines caller using return PyArray_Return(obj) or 340 PyArray_BuildValue with the "N" construct !!! 341 */ 342 PyArrayObject *pymatrix(PyObject *objin) { 343 return (PyArrayObject *) PyArray_ContiguousFromObject(objin, 344 NPY_DOUBLE, 2,2); 345 } 346 /* ==== Create Carray from PyArray ====================== 347 Assumes PyArray is contiguous in memory. 348 Memory is allocated! */ 349 double **pymatrix_to_Carrayptrs(PyArrayObject *arrayin) { 350 double **c, *a; 351 int i,n,m; 352 353 n=arrayin->dimensions[0]; 354 m=arrayin->dimensions[1]; 355 c=ptrvector(n); 356 a=(double *) arrayin->data; /* pointer to arrayin data as double */ 357 for ( i=0; i<n; i++) { 358 c[i]=a+i*m; } 359 return c; 360 } 361 /* ==== Allocate a double *vector (vec of pointers) ====================== 362 Memory is Allocated! See void free_Carray(double ** ) */ 363 double **ptrvector(long n) { 364 double **v; 365 v=(double **)malloc((size_t) (n*sizeof(double))); 366 if (!v) { 367 printf("In **ptrvector. Allocation of memory for double array failed."); 368 exit(0); } 369 return v; 370 } 371 /* ==== Free a double *vector (vec of pointers) ========================== */ 372 void free_Carrayptrs(double **v) { 373 free((char*) v); 374 } 375 /* ==== Check that PyArrayObject is a double (Float) type and a matrix ============== 376 return 1 if an error and raise exception */ 377 int not_doublematrix(PyArrayObject *mat) { 378 if (mat->descr->type_num != NPY_DOUBLE || mat->nd != 2) { 379 PyErr_SetString(PyExc_ValueError, 380 "In not_doublematrix: array must be of type Float and 2 dimensional (n x m)."); 381 return 1; } 382 return 0; 383 } 384 385 /* #### Integer 2D Array Extensions ############################## */ 386 387 /* ==== Integer function - manipulate integer 2D array in place ====================== 388 Replace >=0 integer with 1 and < 0 integer with 0 and put in output 389 interface: intfcn1(int1, afloat) 390 int1 is a NumPy integer 2D array, afloat is a Python float 391 Returns integer 1 if successful */ 392 static PyObject *intfcn1(PyObject *self, PyObject *args) 393 { 394 PyArrayObject *intin, *intout; // The python objects to be extracted from the args 395 int **cin, **cout; // The C integer 2D arrays to be created to point to the 396 // python integer 2D arrays, cin and cout point to the rows 397 // of intin and intout, respectively 398 int i,j,n,m, dims[2]; 399 double afloat; 400 401 /* Parse tuples separately since args will differ between C fcns */ 402 if (!PyArg_ParseTuple(args, "O!d", 403 &PyArray_Type, &intin, &afloat)) return NULL; 404 if (NULL == intin) return NULL; 405 406 printf("In intfcn1, the input Python float = %e, a C double\n",afloat); 407 408 /* Check that object input is int type and a 2D array 409 Not needed if python wrapper function checks before call to this routine */ 410 if (not_int2Darray(intin)) return NULL; 411 412 /* Get the dimensions of the input */ 413 n=dims[0]=intin->dimensions[0]; 414 m=dims[1]=intin->dimensions[1]; 415 416 /* Make a new int array of same dims */ 417 intout=(PyArrayObject *) PyArray_FromDims(2,dims,NPY_LONG); 418 419 /* Change contiguous arrays into C ** arrays (Memory is Allocated!) */ 420 cin=pyint2Darray_to_Carrayptrs(intin); 421 cout=pyint2Darray_to_Carrayptrs(intout); 422 423 /* Do the calculation. */ 424 for ( i=0; i<n; i++) { 425 for ( j=0; j<m; j++) { 426 if (cin[i][j] >= 0) { 427 cout[i][j]= 1; } 428 else { 429 cout[i][j]= 0; } 430 } } 431 432 printf("In intfcn1, the output array is,\n\n"); 433 434 for ( i=0; i<n; i++) { 435 for ( j=0; j<m; j++) { 436 printf("%d ",cout[i][j]); 437 } 438 printf("\n"); 439 } 440 printf("\n"); 441 442 /* Free memory, close file and return */ 443 free_Cint2Darrayptrs(cin); 444 free_Cint2Darrayptrs(cout); 445 return PyArray_Return(intout); 446 } 447 /* #### Integer Array Utility functions ######################### */ 448 449 /* ==== Make a Python int Array Obj. from a PyObject, ================ 450 generates a 2D integer array w/ contiguous memory which may be a new allocation if 451 the original was not an integer type or contiguous 452 !! Must DECREF the object returned from this routine unless it is returned to the 453 caller of this routines caller using return PyArray_Return(obj) or 454 PyArray_BuildValue with the "N" construct !!! 455 */ 456 PyArrayObject *pyint2Darray(PyObject *objin) { 457 return (PyArrayObject *) PyArray_ContiguousFromObject(objin, 458 NPY_LONG, 2,2); 459 } 460 /* ==== Create integer 2D Carray from PyArray ====================== 461 Assumes PyArray is contiguous in memory. 462 Memory is allocated! */ 463 int **pyint2Darray_to_Carrayptrs(PyArrayObject *arrayin) { 464 int **c, *a; 465 int i,n,m; 466 467 n=arrayin->dimensions[0]; 468 m=arrayin->dimensions[1]; 469 c=ptrintvector(n); 470 a=(int *) arrayin->data; /* pointer to arrayin data as int */ 471 for ( i=0; i<n; i++) { 472 c[i]=a+i*m; } 473 return c; 474 } 475 /* ==== Allocate a a *int (vec of pointers) ====================== 476 Memory is Allocated! See void free_Carray(int ** ) */ 477 int **ptrintvector(long n) { 478 int **v; 479 v=(int **)malloc((size_t) (n*sizeof(int))); 480 if (!v) { 481 printf("In **ptrintvector. Allocation of memory for int array failed."); 482 exit(0); } 483 return v; 484 } 485 /* ==== Free an int *vector (vec of pointers) ========================== */ 486 void free_Cint2Darrayptrs(int **v) { 487 free((char*) v); 488 } 489 /* ==== Check that PyArrayObject is an int (integer) type and a 2D array ============== 490 return 1 if an error and raise exception 491 Note: Use NY_LONG for NumPy integer array, not NP_INT */ 492 int not_int2Darray(PyArrayObject *mat) { 493 if (mat->descr->type_num != NPY_LONG || mat->nd != 2) { 494 PyErr_SetString(PyExc_ValueError, 495 "In not_int2Darray: array must be of type int and 2 dimensional (n x m)."); 496 return 1; } 497 return 0; 498 } 499 500 501 502 503 504 505 506 // EOF 507 508 509 510 511

Now, lets look at the source code in smaller chunks.

#### Headers

You must include the following headers with Python.h **always** the first header included.

#include "Python.h" #include "arrayobject.h"

I also include the header C_arraytest.h which contains the prototype of the matsq function:

static PyObject *matsq(PyObject *self, PyObject *args);

The static keyword in front of a function declaration makes this function private to your extension module. The linker just won't see it. This way you can use the same intuitional function names(i.e. sum, check, trace) for all extension modules without having name clashes between them at link time. The type of the function is `PyObject *` because it will always be returning to a Python calling function so you can (must, actually) return a Python object. The arguments are always the same ,

PyObject *self and PyObject *args

The first one self is never used, but necessary because of how Python passes arguments. The second args is a pointer to a Python tuple that contains all of the arguments (B,i,x) of the function.

#### Method definitions

This sets up a table of function names that will be the interface from your Python code to your C extension. The name of the C extension module will be `_C_arraytest` (note the leading underscore). It is important to get the name right each time it is used because there are strict requirements on using the module name in the code. The name appears first in the method definitions table as the first part of the table name:

static PyMethodDef _C_arraytestMethods[] = { ..., {"matsq", matsq, METH_VARARGS}, ..., {NULL, NULL} /* Sentinel - marks the end of this structure */ };

where I used ellipses (...) to ignore other code not relevant to this function. The `METH_VARARGS` parameter tells the compiler that you will pass the arguments the usual way without keywords as in the example `A=matsq(B,i,x)` above. There are ways to use Python keywords, but I have not tried them out. The table should always end with {NULL, NULL} which is just a "marker" to note the end of the table.

#### Initializations

These functions tell the Python interpreter what to call when the module is loaded. Note the name of the module (`_C_arraytest`) must come directly after the init in the name of the initialization structure.

void init_C_arraytest() { (void) Py_InitModule("_C_arraytest", _C_arraytestMethods); import_array(); // Must be present for NumPy. Called first after above line. }

The order is important and you must call these two initialization functions first.

#### The matsqfunction code

Now here is the actual function that you will call from Python code. I will split it up and explain each section.

The function name and type:

static PyObject *matsq(PyObject *self, PyObject *args)

You can see they match the prototype in C_arraytest.h.

The local variables:

{ PyArrayObject *matin, *matout; double **cin, **cout, dfactor; int i,j,n,m, dims[2], ifactor;

The `PyArrayObjects` are structures defined in the NumPy header file and they will be assigned pointers to the actual input and output NumPy arrays (A and B). The C arrays `cin` and `cout` are Cpointers that will point (eventually) to the actual data in the NumPy arrays and allow you to manipulate it. The variable `dfactor` will be the Python float y, `ifactor` will be the Python int i, the variables i,j,n, and m will be loop variables (i and j) and matrix dimensions (n= number of rows, m= number of columns) in A and B. The array dims will be used to access n and m from the NumPy array. All this happens below. First we have to extract the input variables (A, i, y) from the args tuple. This is done by the call,

{ /* Parse tuples separately since args will differ between C fcns */ if (!PyArg_ParseTuple(args, "O!id", &PyArray_Type, &matin, &ifactor, &dfactor)) return NULL;

The `PyArg_ParseTuple` function takes the args tuple and using the format string that appears next ("O!id" ) it assigns each member of the tuple to a C variable. Note you must pass all C variables by reference. This is true even if the C variable is a pointer to a string (see code in vecfcn1 routine). The format string tells the parsing function what type of variable to use. The common variables for Python all have letter names (e.g. s for string, i for integer, d for (double - the Python float)). You can find a list of these and many more in Guido's tutorial (http://docs.python.org/ext/ext.html). For data types that are not in standard Python like the NumPy arrays you use the O! notation which tells the parser to look for a type structure (in this case a NumPy structure `PyArray_Type`) to help it convert the tuple member that will be assigned to the local variable ( matin ) pointing to the NumPy array structure. Note these are also passed by reference. The order must be maintained and match the calling interface of the Python function you want. The format string defines the interface and if you do not call the function from Python so the number of arguments match the number in the format string, you will get an error. This is good since it will point to where the problem is.

If this doesn't work we return NULL which will cause a Python exception.

if (NULL == matin) return NULL;

Next we have a check that the input matrix really is a matrix of NumPy type double. This test is also done in the Python wrapper for this C extension. It is better to do it there, but I include the test here to show you that you can do testing in the C extension and you can "reach into" the NumPy structure to pick out it's parameters. The utility function `not_doublematrix` is explained later.

/* Check that object input is 'double' type and a matrix Not needed if python wrapper function checks before call to this routine */ if (not_doublematrix(matin)) return NULL;

Here's an example of reaching into the NumPy structure to get the dimensions of the matrix matin and assign them to local variables as mentioned above.

/* Get the dimensions of the input */ n=dims[0]=matin->dimensions[0]; m=dims[1]=matin->dimensions[1];

Now we use these matrix parameters to generate a new NumPy matrix matout (our output) right here in our C extension. PyArray_FromDims(2,dims,NPY_DOUBLE) is a utility function provided by NumPy (not me) and its arguments tell NumPy the rank of the NumPy object (2), the size of each dimension (dims), and the data type (NPY_DOUBLE). Other examples of creating different NumPy arrays are in the other C extensions.

/* Make a new double matrix of same dims */ matout=(PyArrayObject *) PyArray_FromDims(2,dims,NPY_DOUBLE);

To actually do our calculations we need C structures to handle our data so we generate two C 2-dimensional arrays (cin and cout) which will point to the data in matin and matout, respectively. Note, here memory is allocated since we need to create an array of pointers to C doubles so we can address cin and cout like usual C matrices with two indices. This memory must be released at the end of this C extension. Memory allocation like this is not always necessary. See the routines for NumPy vector manipulation and treating NumPy matrices like contiguous arrays (as they are in NumPy) in the C extension (the routine contigmat).

/* Change contiguous arrays into C ** arrays (Memory is Allocated!) */ cin=pymatrix_to_Carrayptrs(matin); cout=pymatrix_to_Carrayptrs(matout);

Finally, we get to the point where we can manipulate the matrices and do our calculations. Here is the part where the original equation operations *B _{ij}*=

*i y*(

*A*)

_{ij}^{2}are carried out. Note, we are directly manipulating the data in the original NumPy arrays A and B passed to this extension. So anything you do here to the components of cin or cout will be done to the original matrices and will appear there when you return to the Python code.

/* Do the calculation. */ for ( i=0; i<n; i++) { for ( j=0; j<m; j++) { cout[i][j]= ifactor*dfactor*cin[i][j]*cin[i][j]; } }

We are ready to go back to the Python calling routine, but first we release the memory we allocated for cin and cout.

/* Free memory, close file and return */ free_Carrayptrs(cin); free_Carrayptrs(cout);

Now we return the result of the calculation.

return PyArray_Return(matout); }

If you look at the other C extensions you can see that you can also return regular Python variables (like ints) using another Python-provided function `Py_BuildValue("i", 1)` where the string "i" tells the function the data type and the second argument (1 here) is the data value to be returned. If you decide to return nothing, you **must** return the Python keyword None like this:

Py_INCREF(Py_None); return Py_None;

The Py_INCREF function increases the number of references to None (remember Python collects allocated memory automatically when there are no more references to the data). You must be careful about this in the C extensions. For more info see Guido´s tutorial.

#### The utility functions

Here are some quick descriptions of the matrix utility functions. They are pretty much self-explanatory. The vector and integer array utility functions are very similar.

The first utility function is not used in any of the C extensions here, but I include it because a helpful person sent it along with some code and it does show how one might convert a python object to a NumPy array. I have not tried it. Use at your own risk.

PyArrayObject *pymatrix(PyObject *objin) { return (PyArrayObject *) PyArray_ContiguousFromObject(objin, NPY_DOUBLE, 2,2); }

The next one creates the C arrays that are used to point to the rows of the NumPy matrices. This allocates arrays of pointers which point into the NumPy data. The NumPy data is contiguous and strides (m) are used to access each row. This function calls ptrvector(n) which does the actual memory allocation. Remember to deallocate memory after using this one.

double **pymatrix_to_Carrayptrs(PyArrayObject *arrayin) { double **c, *a; int i,n,m; n=arrayin->dimensions[0]; m=arrayin->dimensions[1]; c=ptrvector(n); a=(double *) arrayin->data; /* pointer to arrayin data as double */ for ( i=0; i<n; i++) { c[i]=a+i*m; } return c; }

Here is where the memory for the C arrays of pointers is allocated. It's a pretty standard memory allocator for arrays.

double **ptrvector(long n) { double **v; v=(double **)malloc((size_t) (n*sizeof(double))); if (!v) { printf("In **ptrvector. Allocation of memory for double array failed."); exit(0); } return v; }

This is the routine to deallocate the memory.

void free_Carrayptrs(double **v) { free((char*) v); }

**Note**: There is a standard C-API for converting from Python objects to C-style arrays-of-pointers called PyArray_AsCArray

Here is a utility function that checks to make sure the object produced by the parse is a NumPy matrix. You can see how it reaches into the NumPy object structure.

int not_doublematrix(PyArrayObject *mat) { if (mat->descr->type_num != NPY_DOUBLE || mat->nd != 2) { PyErr_SetString(PyExc_ValueError, "In not_doublematrix: array must be of type Float and 2 dimensional (n x m)."); return 1; } return 0; }

### The C Code – other variations

As I mentioned in the introduction the functions are repetitious. All the other functions follow a very similar pattern. They are given a line in the methods structure, they have the same arguments, they parse the arguments, they may check the C structures after the parsing, they set up C variables to manipulate which point to the input data, they do the actual calculation, they deallocate memory (if necessary) and they return something for Python (either None or a Python object). I'll just mention some of the differences in the code from the above matrix C extension matsq.

**vecfcn1:**

The format string for the parse function specifies that a variable from Python is a string (s).

if (!PyArg_ParseTuple(args, "O!O!sd", &PyArray_Type, &vecin, &PyArray_Type, &vecout, &str, &dfac)) return NULL;

No memory is allocated in the pointer assignments for the local C arrays because they are already vectors.

cin=pyvector_to_Carrayptrs(vecin); cout=pyvector_to_Carrayptrs(vecout);

The return is an int = 1 if successful. This is returned as a Python int.

return Py_BuildValue("i", 1);

**rowx2:**

In this routine we "pass back" the output using the fact that it is passed by reference in the argument tuple list and is changed in place by the manipulations. Compare this to returning an array from the C extension in matsq. Either one gets the job done.

**contigmat:**

Here the matrix data is treated like a long vector (just like stacking the rows end to end). This is useful if you have array classes in C++ which store the data as one long vector and then use strides to access it like an array (two-dimensional, three-dimensional, or whatever). Even though matin and matout are "matrices" we treat them like vectors and use the vector utilities to get our C pointers cin and cout.

/* Change contiguous arrays into C * arrays pointers to PyArrayObject data */ cin=pyvector_to_Carrayptrs(matin); cout=pyvector_to_Carrayptrs(matout);

For other utility functions note that we use different rank, dimensions, and NumPy parameters (e.g. `NPY_LONG`) to tell the routines we are calling what the data types are.

### The Make File

The make file is very simple. It is written for Mac OS X 10.4, as BSD Unix.

# ---- Link --------------------------- _C_arraytest.so: C_arraytest.o C_arraytest.mak gcc -bundle -flat_namespace -undefined suppress -o _C_arraytest.so C_arraytest.o # ---- gcc C compile ------------------ C_arraytest.o: C_arraytest.c C_arraytest.h C_arraytest.mak gcc -c C_arraytest.c -I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4 -I/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy/core/include/numpy/

The compile step is pretty standard. You do need to add paths to the Python headers:

`-I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4`

and paths to NumPy headers:

`-I/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy/core/include/numpy/`

These paths are for a Framework Python 2.4 install on Mac OS X. You need to supply paths to the headers installed on your computer. They may be different. My guess is the gcc flags will be the same for the compile.

The link step produces the actual module (`_C_arraytest.so`) which can be imported to Python code. This is specific to the Mac OS X system. On Linux or Windows you will need something different. I have been searching for generic examples, but I'm not sure what I found would work for most people so I chose not to display the findings there. I cannot judge whether the code is good for those systems.

Note, again the name of the produced shared library **must** match the name in the initialization and methods definition calls in the C extension source code. Hence the leading underline in the name `_C_arraytest.so`.

Here's my modified Makefile which compiles this code under Linux (save it as Makefile in the same directory, and then run 'make' --PaulIvanov

# ---- Link --------------------------- _C_arraytest.so: C_arraytest.o gcc -shared C_arraytest.o -o _C_arraytest.so # ---- gcc C compile ------------------ C_arraytest.o: C_arraytest.c C_arraytest.h gcc -c C_arraytest.c -I/usr/include/python2.4 -I/usr/lib/python2.4/site-packages/numpy/core/include/numpy

### The Python Wrapper Code

Here as in the C code I will only show detailed description of one wrapper function and its use. There is so much repetition that the other wrappers will be clear if you understand one. I will again use the matsq function. This is the code that will first be called when you invoke the matsq function in your own Python code after importing the wrapper module (`C_arraytest.py`) which automatically imports and uses (in a way hidden from the user) the actual C extensions in `_C_arraytest.so` (note the leading underscore which keeps the names separate).

**imports:**

Import the C extensions, NumPy, and the system module (used for the exit statement at the end which is optional).

1 import _C_arraytest 2 import numpy as NP 3 import sys

**The definition of the Python matsq function**

Pass a NumPy matrix (matin), a Python int (ifac), and a Python float (dfac). Check the arguments to make sure they are the right type and dimensions and size. This is much easier and safer on the Python side which is why I do it here even though I showed a way to do some of this in the C extensions.

1 def matsq(matin, ifac, dfac): 2 # .... Check arguments, double NumPy matrices? 3 test=NP.zeros((2,2)) # create a NumPy matrix as a test object to check matin 4 typetest= type(test) # get its type 5 datatest=test.dtype # get data type 6 if type(matin) != typetest: 7 raise 'In matsq, matrix argument is not *NumPy* array' 8 if len(NP.shape(matin)) != 2: 9 raise 'In matsq, matrix argument is not NumPy *matrix*' 10 if matin.dtype != datatest: 11 raise 'In matsq, matrix argument is not *Float* NumPy matrix' 12 if type(ifac) != type(1): 13 raise 'In matsq, ifac argument is not an int' 14 if type(dfac) != type(1.0): 15 raise 'In matsq, dfac argument is not a python float'

Finally, call the C extension to do the actual calculation on matin.

1 # .... Call C extension function 2 return _C_arraytest.matsq(matin, ifac, dfac)

You can see that the python part is the easiest.

**Using your C extension**

If the test function `mattest2` were in another module (one you were writing), here's how you would use it to call the wrapped matsq function in a script.

1 from C_arraytest import * # Note, NO underscore since you are importing the wrapper. 2 3 import numpy as NP 4 import sys 5 6 def mattest2(): 7 print "\n--- Test matsq ------------------------------" 8 print " Each 2nd matrix component should = square of 1st matrix component x ifac x dfac \n" 9 n=4 # Number of columns # Make 2 x n matrices 10 z=NP.arange(float(n)) 11 x=NP.array([z,3.0*z]) 12 jfac=2 13 xfac=1.5 14 y=matsq(x, jfac, xfac) 15 print "x=",x 16 print "y=",y 17 18 if __name__ == '__main__': 19 mattest2()

The output looks like this:

--- Test matsq ------------------------------ Each 2nd matrix component should = square of 1st matrix component x ifac x dfac x= [[ 0. 1. 2. 3.] [ 0. 3. 6. 9.]] y= [[ 0. 3. 12. 27.] [ 0. 27. 108. 243.]]

The output of all the test functions is in the file `C_arraytest` output.

## Summary

This is the first draft explaining the C extensions I wrote (with help). If you have comments on the code, mistakes, etc. Please post them on the pythonmac email list. I will see them.

*...or better edit the wiki!*

I wrote the C extensions to work with NumPy although they were originally written for Numeric. If you must use Numeric you should test them to see if they are compatible. I suspect names like `NPY_DOUBLE`, for example, will not be. I strongly suggest you upgrade to the NumPy since it is the future of Numeric in Python. It's worth the effort.

# Comments?!

Note that this line, while in the header file above, is missing from the .h in the tar-ball.

static PyObject *rowx2_v2(PyObject *self, PyObject *args);

-- PaulIvanov

The output file name should be _C_arraytest_d.pyd for Debug version and _C_arraytest.pyd for Release version.

- -- Geoffrey Zhu

ptrvector() allocates n*sizeof(double), but should really allocate pointers to double; so: n*sizeof(double *)

- -- Peter Meerwald

- In vecsq(), line 86, since you will be creating a 1-dimensional vector:
int i,j,n,m, dims[2];

should be:int i,j,n,m, dims[1];

In

`pyvector_to_Carrayptrs()`,`n`is never used.