Back to home page

EIC code displayed by LXR

 
 

    


Warning, /jana2/src/python/externals/pybind11-2.10.3/docs/advanced/pycpp/numpy.rst is written in an unsupported language. File is not indexed.

0001 .. _numpy:
0002 
0003 NumPy
0004 #####
0005 
0006 Buffer protocol
0007 ===============
0008 
0009 Python supports an extremely general and convenient approach for exchanging
0010 data between plugin libraries. Types can expose a buffer view [#f2]_, which
0011 provides fast direct access to the raw internal data representation. Suppose we
0012 want to bind the following simplistic Matrix class:
0013 
0014 .. code-block:: cpp
0015 
0016     class Matrix {
0017     public:
0018         Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
0019             m_data = new float[rows*cols];
0020         }
0021         float *data() { return m_data; }
0022         size_t rows() const { return m_rows; }
0023         size_t cols() const { return m_cols; }
0024     private:
0025         size_t m_rows, m_cols;
0026         float *m_data;
0027     };
0028 
0029 The following binding code exposes the ``Matrix`` contents as a buffer object,
0030 making it possible to cast Matrices into NumPy arrays. It is even possible to
0031 completely avoid copy operations with Python expressions like
0032 ``np.array(matrix_instance, copy = False)``.
0033 
0034 .. code-block:: cpp
0035 
0036     py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
0037        .def_buffer([](Matrix &m) -> py::buffer_info {
0038             return py::buffer_info(
0039                 m.data(),                               /* Pointer to buffer */
0040                 sizeof(float),                          /* Size of one scalar */
0041                 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
0042                 2,                                      /* Number of dimensions */
0043                 { m.rows(), m.cols() },                 /* Buffer dimensions */
0044                 { sizeof(float) * m.cols(),             /* Strides (in bytes) for each index */
0045                   sizeof(float) }
0046             );
0047         });
0048 
0049 Supporting the buffer protocol in a new type involves specifying the special
0050 ``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
0051 ``def_buffer()`` method with a lambda function that creates a
0052 ``py::buffer_info`` description record on demand describing a given matrix
0053 instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
0054 specification.
0055 
0056 .. code-block:: cpp
0057 
0058     struct buffer_info {
0059         void *ptr;
0060         py::ssize_t itemsize;
0061         std::string format;
0062         py::ssize_t ndim;
0063         std::vector<py::ssize_t> shape;
0064         std::vector<py::ssize_t> strides;
0065     };
0066 
0067 To create a C++ function that can take a Python buffer object as an argument,
0068 simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
0069 in a great variety of configurations, hence some safety checks are usually
0070 necessary in the function body. Below, you can see a basic example on how to
0071 define a custom constructor for the Eigen double precision matrix
0072 (``Eigen::MatrixXd``) type, which supports initialization from compatible
0073 buffer objects (e.g. a NumPy matrix).
0074 
0075 .. code-block:: cpp
0076 
0077     /* Bind MatrixXd (or some other Eigen type) to Python */
0078     typedef Eigen::MatrixXd Matrix;
0079 
0080     typedef Matrix::Scalar Scalar;
0081     constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
0082 
0083     py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
0084         .def(py::init([](py::buffer b) {
0085             typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
0086 
0087             /* Request a buffer descriptor from Python */
0088             py::buffer_info info = b.request();
0089 
0090             /* Some basic validation checks ... */
0091             if (info.format != py::format_descriptor<Scalar>::format())
0092                 throw std::runtime_error("Incompatible format: expected a double array!");
0093 
0094             if (info.ndim != 2)
0095                 throw std::runtime_error("Incompatible buffer dimension!");
0096 
0097             auto strides = Strides(
0098                 info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
0099                 info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
0100 
0101             auto map = Eigen::Map<Matrix, 0, Strides>(
0102                 static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
0103 
0104             return Matrix(map);
0105         }));
0106 
0107 For reference, the ``def_buffer()`` call for this Eigen data type should look
0108 as follows:
0109 
0110 .. code-block:: cpp
0111 
0112     .def_buffer([](Matrix &m) -> py::buffer_info {
0113         return py::buffer_info(
0114             m.data(),                                /* Pointer to buffer */
0115             sizeof(Scalar),                          /* Size of one scalar */
0116             py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
0117             2,                                       /* Number of dimensions */
0118             { m.rows(), m.cols() },                  /* Buffer dimensions */
0119             { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
0120               sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
0121                                                      /* Strides (in bytes) for each index */
0122         );
0123      })
0124 
0125 For a much easier approach of binding Eigen types (although with some
0126 limitations), refer to the section on :doc:`/advanced/cast/eigen`.
0127 
0128 .. seealso::
0129 
0130     The file :file:`tests/test_buffers.cpp` contains a complete example
0131     that demonstrates using the buffer protocol with pybind11 in more detail.
0132 
0133 .. [#f2] http://docs.python.org/3/c-api/buffer.html
0134 
0135 Arrays
0136 ======
0137 
0138 By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
0139 restrict the function so that it only accepts NumPy arrays (rather than any
0140 type of Python object satisfying the buffer protocol).
0141 
0142 In many situations, we want to define a function which only accepts a NumPy
0143 array of a certain data type. This is possible via the ``py::array_t<T>``
0144 template. For instance, the following function requires the argument to be a
0145 NumPy array containing double precision values.
0146 
0147 .. code-block:: cpp
0148 
0149     void f(py::array_t<double> array);
0150 
0151 When it is invoked with a different type (e.g. an integer or a list of
0152 integers), the binding code will attempt to cast the input into a NumPy array
0153 of the requested type. This feature requires the :file:`pybind11/numpy.h`
0154 header to be included. Note that :file:`pybind11/numpy.h` does not depend on
0155 the NumPy headers, and thus can be used without declaring a build-time
0156 dependency on NumPy; NumPy>=1.7.0 is a runtime dependency.
0157 
0158 Data in NumPy arrays is not guaranteed to packed in a dense manner;
0159 furthermore, entries can be separated by arbitrary column and row strides.
0160 Sometimes, it can be useful to require a function to only accept dense arrays
0161 using either the C (row-major) or Fortran (column-major) ordering. This can be
0162 accomplished via a second template argument with values ``py::array::c_style``
0163 or ``py::array::f_style``.
0164 
0165 .. code-block:: cpp
0166 
0167     void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
0168 
0169 The ``py::array::forcecast`` argument is the default value of the second
0170 template parameter, and it ensures that non-conforming arguments are converted
0171 into an array satisfying the specified requirements instead of trying the next
0172 function overload.
0173 
0174 There are several methods on arrays; the methods listed below under references
0175 work, as well as the following functions based on the NumPy API:
0176 
0177 - ``.dtype()`` returns the type of the contained values.
0178 
0179 - ``.strides()`` returns a pointer to the strides of the array (optionally pass
0180   an integer axis to get a number).
0181 
0182 - ``.flags()`` returns the flag settings. ``.writable()`` and ``.owndata()``
0183   are directly available.
0184 
0185 - ``.offset_at()`` returns the offset (optionally pass indices).
0186 
0187 - ``.squeeze()`` returns a view with length-1 axes removed.
0188 
0189 - ``.view(dtype)`` returns a view of the array with a different dtype.
0190 
0191 - ``.reshape({i, j, ...})`` returns a view of the array with a different shape.
0192   ``.resize({...})`` is also available.
0193 
0194 - ``.index_at(i, j, ...)`` gets the count from the beginning to a given index.
0195 
0196 
0197 There are also several methods for getting references (described below).
0198 
0199 Structured types
0200 ================
0201 
0202 In order for ``py::array_t`` to work with structured (record) types, we first
0203 need to register the memory layout of the type. This can be done via
0204 ``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
0205 expects the type followed by field names:
0206 
0207 .. code-block:: cpp
0208 
0209     struct A {
0210         int x;
0211         double y;
0212     };
0213 
0214     struct B {
0215         int z;
0216         A a;
0217     };
0218 
0219     // ...
0220     PYBIND11_MODULE(test, m) {
0221         // ...
0222 
0223         PYBIND11_NUMPY_DTYPE(A, x, y);
0224         PYBIND11_NUMPY_DTYPE(B, z, a);
0225         /* now both A and B can be used as template arguments to py::array_t */
0226     }
0227 
0228 The structure should consist of fundamental arithmetic types, ``std::complex``,
0229 previously registered substructures, and arrays of any of the above. Both C++
0230 arrays and ``std::array`` are supported. While there is a static assertion to
0231 prevent many types of unsupported structures, it is still the user's
0232 responsibility to use only "plain" structures that can be safely manipulated as
0233 raw memory without violating invariants.
0234 
0235 Vectorizing functions
0236 =====================
0237 
0238 Suppose we want to bind a function with the following signature to Python so
0239 that it can process arbitrary NumPy array arguments (vectors, matrices, general
0240 N-D arrays) in addition to its normal arguments:
0241 
0242 .. code-block:: cpp
0243 
0244     double my_func(int x, float y, double z);
0245 
0246 After including the ``pybind11/numpy.h`` header, this is extremely simple:
0247 
0248 .. code-block:: cpp
0249 
0250     m.def("vectorized_func", py::vectorize(my_func));
0251 
0252 Invoking the function like below causes 4 calls to be made to ``my_func`` with
0253 each of the array elements. The significant advantage of this compared to
0254 solutions like ``numpy.vectorize()`` is that the loop over the elements runs
0255 entirely on the C++ side and can be crunched down into a tight, optimized loop
0256 by the compiler. The result is returned as a NumPy array of type
0257 ``numpy.dtype.float64``.
0258 
0259 .. code-block:: pycon
0260 
0261     >>> x = np.array([[1, 3], [5, 7]])
0262     >>> y = np.array([[2, 4], [6, 8]])
0263     >>> z = 3
0264     >>> result = vectorized_func(x, y, z)
0265 
0266 The scalar argument ``z`` is transparently replicated 4 times.  The input
0267 arrays ``x`` and ``y`` are automatically converted into the right types (they
0268 are of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
0269 ``numpy.dtype.float32``, respectively).
0270 
0271 .. note::
0272 
0273     Only arithmetic, complex, and POD types passed by value or by ``const &``
0274     reference are vectorized; all other arguments are passed through as-is.
0275     Functions taking rvalue reference arguments cannot be vectorized.
0276 
0277 In cases where the computation is too complicated to be reduced to
0278 ``vectorize``, it will be necessary to create and access the buffer contents
0279 manually. The following snippet contains a complete example that shows how this
0280 works (the code is somewhat contrived, since it could have been done more
0281 simply using ``vectorize``).
0282 
0283 .. code-block:: cpp
0284 
0285     #include <pybind11/pybind11.h>
0286     #include <pybind11/numpy.h>
0287 
0288     namespace py = pybind11;
0289 
0290     py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
0291         py::buffer_info buf1 = input1.request(), buf2 = input2.request();
0292 
0293         if (buf1.ndim != 1 || buf2.ndim != 1)
0294             throw std::runtime_error("Number of dimensions must be one");
0295 
0296         if (buf1.size != buf2.size)
0297             throw std::runtime_error("Input shapes must match");
0298 
0299         /* No pointer is passed, so NumPy will allocate the buffer */
0300         auto result = py::array_t<double>(buf1.size);
0301 
0302         py::buffer_info buf3 = result.request();
0303 
0304         double *ptr1 = static_cast<double *>(buf1.ptr);
0305         double *ptr2 = static_cast<double *>(buf2.ptr);
0306         double *ptr3 = static_cast<double *>(buf3.ptr);
0307 
0308         for (size_t idx = 0; idx < buf1.shape[0]; idx++)
0309             ptr3[idx] = ptr1[idx] + ptr2[idx];
0310 
0311         return result;
0312     }
0313 
0314     PYBIND11_MODULE(test, m) {
0315         m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
0316     }
0317 
0318 .. seealso::
0319 
0320     The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
0321     example that demonstrates using :func:`vectorize` in more detail.
0322 
0323 Direct access
0324 =============
0325 
0326 For performance reasons, particularly when dealing with very large arrays, it
0327 is often desirable to directly access array elements without internal checking
0328 of dimensions and bounds on every access when indices are known to be already
0329 valid.  To avoid such checks, the ``array`` class and ``array_t<T>`` template
0330 class offer an unchecked proxy object that can be used for this unchecked
0331 access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
0332 where ``N`` gives the required dimensionality of the array:
0333 
0334 .. code-block:: cpp
0335 
0336     m.def("sum_3d", [](py::array_t<double> x) {
0337         auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
0338         double sum = 0;
0339         for (py::ssize_t i = 0; i < r.shape(0); i++)
0340             for (py::ssize_t j = 0; j < r.shape(1); j++)
0341                 for (py::ssize_t k = 0; k < r.shape(2); k++)
0342                     sum += r(i, j, k);
0343         return sum;
0344     });
0345     m.def("increment_3d", [](py::array_t<double> x) {
0346         auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
0347         for (py::ssize_t i = 0; i < r.shape(0); i++)
0348             for (py::ssize_t j = 0; j < r.shape(1); j++)
0349                 for (py::ssize_t k = 0; k < r.shape(2); k++)
0350                     r(i, j, k) += 1.0;
0351     }, py::arg().noconvert());
0352 
0353 To obtain the proxy from an ``array`` object, you must specify both the data
0354 type and number of dimensions as template arguments, such as ``auto r =
0355 myarray.mutable_unchecked<float, 2>()``.
0356 
0357 If the number of dimensions is not known at compile time, you can omit the
0358 dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
0359 ``arr.unchecked<T>()``.  This will give you a proxy object that works in the
0360 same way, but results in less optimizable code and thus a small efficiency
0361 loss in tight loops.
0362 
0363 Note that the returned proxy object directly references the array's data, and
0364 only reads its shape, strides, and writeable flag when constructed.  You must
0365 take care to ensure that the referenced array is not destroyed or reshaped for
0366 the duration of the returned object, typically by limiting the scope of the
0367 returned instance.
0368 
0369 The returned proxy object supports some of the same methods as ``py::array`` so
0370 that it can be used as a drop-in replacement for some existing, index-checked
0371 uses of ``py::array``:
0372 
0373 - ``.ndim()`` returns the number of dimensions
0374 
0375 - ``.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
0376   the ``const T`` or ``T`` data, respectively, at the given indices.  The
0377   latter is only available to proxies obtained via ``a.mutable_unchecked()``.
0378 
0379 - ``.itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
0380 
0381 - ``.ndim()`` returns the number of dimensions.
0382 
0383 - ``.shape(n)`` returns the size of dimension ``n``
0384 
0385 - ``.size()`` returns the total number of elements (i.e. the product of the shapes).
0386 
0387 - ``.nbytes()`` returns the number of bytes used by the referenced elements
0388   (i.e. ``itemsize()`` times ``size()``).
0389 
0390 .. seealso::
0391 
0392     The file :file:`tests/test_numpy_array.cpp` contains additional examples
0393     demonstrating the use of this feature.
0394 
0395 Ellipsis
0396 ========
0397 
0398 Python provides a convenient ``...`` ellipsis notation that is often used to
0399 slice multidimensional arrays. For instance, the following snippet extracts the
0400 middle dimensions of a tensor with the first and last index set to zero.
0401 
0402 .. code-block:: python
0403 
0404    a = ...  # a NumPy array
0405    b = a[0, ..., 0]
0406 
0407 The function ``py::ellipsis()`` function can be used to perform the same
0408 operation on the C++ side:
0409 
0410 .. code-block:: cpp
0411 
0412    py::array a = /* A NumPy array */;
0413    py::array b = a[py::make_tuple(0, py::ellipsis(), 0)];
0414 
0415 
0416 Memory view
0417 ===========
0418 
0419 For a case when we simply want to provide a direct accessor to C/C++ buffer
0420 without a concrete class object, we can return a ``memoryview`` object. Suppose
0421 we wish to expose a ``memoryview`` for 2x4 uint8_t array, we can do the
0422 following:
0423 
0424 .. code-block:: cpp
0425 
0426     const uint8_t buffer[] = {
0427         0, 1, 2, 3,
0428         4, 5, 6, 7
0429     };
0430     m.def("get_memoryview2d", []() {
0431         return py::memoryview::from_buffer(
0432             buffer,                                    // buffer pointer
0433             { 2, 4 },                                  // shape (rows, cols)
0434             { sizeof(uint8_t) * 4, sizeof(uint8_t) }   // strides in bytes
0435         );
0436     });
0437 
0438 This approach is meant for providing a ``memoryview`` for a C/C++ buffer not
0439 managed by Python. The user is responsible for managing the lifetime of the
0440 buffer. Using a ``memoryview`` created in this way after deleting the buffer in
0441 C++ side results in undefined behavior.
0442 
0443 We can also use ``memoryview::from_memory`` for a simple 1D contiguous buffer:
0444 
0445 .. code-block:: cpp
0446 
0447     m.def("get_memoryview1d", []() {
0448         return py::memoryview::from_memory(
0449             buffer,               // buffer pointer
0450             sizeof(uint8_t) * 8   // buffer size
0451         );
0452     });
0453 
0454 .. versionchanged:: 2.6
0455     ``memoryview::from_memory`` added.