Wrapping native code lets you keep Python’s ergonomics while calling into compiled code for performance-critical paths, hardware drivers, legacy libraries, or numerical algorithms.
Python is growing on me but while Python is expressive and productive, C and C++ are fast. There are many advantages with Python, having a huge base of modules being just one of them. It is ideal for prototyping, is fast to deliver, easy to understand. But sometimes is slow. Least to say.
This guide is compiled from my recent experience with attempts of wrapping natove python code in C and C++. It [should] cover every major approach, compares their strengths and weaknesses, and walks you through real examples from simple to complex.
Why Wrap C or C++ Code?
There are several common reasons you might want to call C or C++ from Python:
- You have an existing C or C++ library that does exactly what you need, and rewriting it in Python would be impractical or would destroy its performance.
- You need raw speed for a tight inner loop — image processing, signal analysis, numerical simulation — where Python’s overhead is prohibitive.
- You are writing a Python interface to a hardware driver or system library that is only available as a compiled binary.
- You want to gradually migrate a legacy C or C++ codebase to Python without rewriting everything at once.
- what else ??
In all these cases, the answer is a binding layer: a thin bridge that lets Python code call into native compiled code as if it were just another Python module.
The Main Approaches
There are six tools you will encounter most often. Each targets a different situation.
ctypes
The main advantage of ctypes is that is built into Python’s standard library. It lets you load any compiled shared library (.so on Linux and macOS, .dll on Windows) and call its functions directly from Python with no compilation step and no extra packages. It is C only — it has no understanding of C++ classes, templates, or name mangling.
cffi
cffi (C Foreign Function Interface) works similarly to ctypes but with a cleaner API. Instead of manually constructing Python type descriptors, you paste C declarations directly from a header file and cffi figures out the rest. It supports both CPython and PyPy, which makes it a good choice when PyPy compatibility is a requirement. Like ctypes, it handles C only.
pybind11
pybind11 is the gold standard for exposing C++ code to Python. It is a header-only C++11 library — you include it in your project and write binding code in C++. It handles classes, inheritance, templates, STL containers, Numpy arrays, operator overloading, keyword arguments, docstrings, and virtual method dispatch. It requires a compilation step but the result is a proper Python extension module that behaves exactly like a native Python module.
nanobind
nanobind is the spiritual successor to pybind11, written by the same author. It produces binaries that are 3 to 10 times smaller and compiles significantly faster, at the cost of requiring C++17 and dropping some of pybind11’s more exotic features. The API is very similar to pybind11, so migrating between them is straightforward. For new C++ projects that do not need to support older compilers, nanobind is increasingly the preferred choice.
Cython
Cython is its own compiled language that is a superset of Python. You write .pyx files that look almost exactly like Python, add optional static type annotations, and Cython compiles them to C. It is excellent for two distinct use cases:
- speeding up pure Python code (you do not need any C at all),
- and wrapping existing C or C++ APIs where the binding logic involves complex data structure manipulation.
It is used by major scientific Python projects including NumPy, SciPy, and lxml.
SWIG
SWIG (Simplified Wrapper and Interface Generator) generates bindings for over twenty languages from a single interface file. It is mature — it has been around since 1995 — and comprehensive. For Python-only projects it is usually overkill and its generated code is verbose and difficult to debug. Its real advantage appears when you need to ship an SDK that must support Python, Java, Go, and Ruby simultaneously from one set of interface definitions.
Strengths and Weaknesses
ctypes: strengths and weaknesses
Strengths: ships with Python so there is nothing to install, no compilation step is needed, works with any pre-compiled shared library, cross-platform, good for rapid prototyping and scripting.
Weaknesses: C only, extremely verbose type declarations for anything non-trivial, very easy to cause a segfault if types are declared incorrectly, no type safety, hard to maintain for large APIs.
cffi: strengths and weaknesses
Strengths: cleaner API than ctypes because you paste header declarations directly, works on PyPy as well as CPython, ABI mode requires no compilation at all, good for wrapping large C APIs quickly.
Weaknesses: C only, ABI mode has runtime overhead compared to compiled bindings, slightly less Pythonic than pybind11 for complex cases.
pybind11: strengths and weaknesses
Strengths: excellent C++11/14/17 support, header-only so it is easy to vendor into a project, first-class Numpy and buffer protocol integration, natural Python idioms including keyword arguments and docstrings, large community and documentation, STL containers are automatically converted to Python types, virtual method dispatch works correctly.
Weaknesses: compilation is always required, slower compile times due to heavy template use, produces larger binaries than nanobind, C++ only so it does not work for pure-C-style APIs.
nanobind: strengths and weaknesses
Strengths: 3 to 10 times smaller binary size than pybind11, significantly faster compilation, modern C++17 design, API is compatible enough with pybind11 that migration is easy.
Weaknesses: requires C++17 or later, smaller community and fewer examples online than pybind11, some pybind11 features are intentionally removed to keep the library lean.
Cython: strengths and weaknesses
Strengths: can speed up pure Python code without writing any C, familiar Python-like syntax, excellent Numpy integration, used and battle-tested by major scientific Python projects.
Weaknesses: introduces a new language (.pyx files) that developers must learn, compilation is always required, generated C code is not human-readable, debugging is more complex than debugging plain Python or C++, overkill if you only need to wrap an existing C++ library.
SWIG: strengths and weaknesses
Strengths: generates bindings for over 20 languages from one interface file, handles large APIs automatically, very mature and stable.
Weaknesses: interface file syntax is complex, generated code is verbose and hard to debug, producing a Pythonic API requires significant extra effort, steep learning curve, almost always overkill for Python-only projects.
Which Tool Should You Use?
The answer depends on what you are wrapping and what constraints you have.
If your code is pure C and you want zero extra dependencies, use ctypes. It is already in the standard library and you can call a shared library in a handful of lines.
If your code is pure C and you can install one package, use cffi. Its API is cleaner than ctypes and it works on PyPy if that matters to your users.
If your code is C++ and you want the most documented, most community-supported option, use pybind11. It is the safest default for new C++ projects and handles virtually every C++ pattern you will encounter.
If your code is C++ and binary size or build time is a concern, use nanobind. Require C++17 and you get much leaner outputs with a very similar API.
If you want to speed up existing Python code without writing any C, or if you are wrapping a complex C API with a lot of data structure manipulation, use Cython.
If you need bindings for Python, Java, Go, Ruby, and others simultaneously from one codebase, use SWIG.
Example 1: ctypes — Calling a Simple C Function
This is the simplest possible example. We compile a C file to a shared library, then call its functions from Python using ctypes. No extra packages are needed.
First, the C source file:
// math_utils.c
#include <stdlib.h>
int add(int a, int b) {
return a + b;
}
double multiply(double a, double b) {
return a * b;
}
Compile it to a shared library:
# Linux / macOS gcc -shared -fPIC -o libmath.so math_utils.c # # Windows # cl /LD math_utils.c /Fe:math_utils.dll
Now call it from Python:
import ctypes import os lib = ctypes.CDLL(os.path.join(os.path.dirname(__file__), "libmath.so")) # Always declare argument and return types explicitly lib.add.argtypes = [ctypes.c_int, ctypes.c_int] lib.add.restype = ctypes.c_int lib.multiply.argtypes = [ctypes.c_double, ctypes.c_double] lib.multiply.restype = ctypes.c_double print(lib.add(3, 4)) # 7 print(lib.multiply(2.5, 4)) # 10.0
Always set argtypes and restype. Without them, ctypes defaults to C int for everything, which causes silent data corruption for doubles and pointers.
Example 2: cffi — Wrapping a C String Function
cffi lets you paste C declarations directly from a header file. Here we wrap a string-processing C library.
The C source file:
// strutils.c
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
int count_vowels(const char *s) {
int count = 0;
while (*s) {
char c = tolower((unsigned char)*s++);
if (c=='a'||c=='e'||c=='i'||c=='o'||c=='u') count++;
}
return count;
}
void reverse_string(char *s) {
int len = strlen(s);
for (int i = 0; i < len / 2; i++) {
char tmp = s[i];
s[i] = s[len - 1 - i];
s[len - 1 - i] = tmp;
}
}
Compile to a shared library:
gcc -shared -fPIC -o libstrutils.so strutils.c
Call it from Python using cffi:
from cffi import FFI
ffi = FFI()
# Paste your C declarations directly here — copy from your header file
ffi.cdef("""
int count_vowels(const char *s);
void reverse_string(char *s);
""")
lib = ffi.dlopen("./libstrutils.so")
# Pass Python bytes objects for const char* parameters
n = lib.count_vowels(b"Hello, World!")
print(f"Vowels: {n}") # 3
# For mutable char* parameters, create a C buffer
buf = ffi.new("char[]", b"Python")
lib.reverse_string(buf)
print(ffi.string(buf).decode()) # nohtyP
cffi’s ABI mode (ffi.dlopen) shown here is great for quick integration. For better performance and type safety, use cffi’s API mode (ffi.compile) which generates a compiled C extension at build time instead of loading dynamically at runtime.
Example 3: pybind11 — Exposing a C++ Class
pybind11 lets you expose full C++ classes with methods, properties, and operator overloads. The resulting Python object behaves exactly like a native Python object.
The C++ source and binding code in a single file:
// vec2.cpp
#include <pybind11/pybind11.h>
#include <pybind11/operators.h>
#include <cmath>
#include <sstream>
namespace py = pybind11;
struct Vec2 {
double x, y;
Vec2(double x = 0, double y = 0) : x(x), y(y) {}
double length() const {
return std::sqrt(x*x + y*y);
}
Vec2 normalized() const {
double len = length();
return {x / len, y / len};
}
Vec2 operator+(const Vec2 &o) const { return {x+o.x, y+o.y}; }
Vec2 operator*(double s) const { return {x*s, y*s}; }
std::string repr() const {
std::ostringstream ss;
ss << "Vec2(" << x << ", " << y << ")";
return ss.str();
}
};
PYBIND11_MODULE(vec2, m) {
m.doc() = "2D vector module";
py::class_<Vec2>(m, "Vec2")
.def(py::init<double, double>(), py::arg("x")=0, py::arg("y")=0)
.def_readwrite("x", &Vec2::x)
.def_readwrite("y", &Vec2::y)
.def("length", &Vec2::length)
.def("normalized", &Vec2::normalized)
.def(py::self + py::self)
.def(py::self * double())
.def("__repr__", &Vec2::repr);
}
The setup.py to build it:
from setuptools import setup, Extension
import pybind11
ext = Extension(
"vec2",
sources=["vec2.cpp"],
include_dirs=[pybind11.get_include()],
language="c++",
extra_compile_args=["-std=c++14"],
)
setup(name="vec2", ext_modules=[ext])
Build and use it:
# pip install pybind11 # python setup.py build_ext --inplace from vec2 import Vec2 a = Vec2(3.0, 4.0) b = Vec2(1.0, 2.0) print(a) # Vec2(3.0, 4.0) print(a.length()) # 5.0 print(a + b) # Vec2(4.0, 6.0) print(a * 0.5) # Vec2(1.5, 2.0) print(a.normalized()) # Vec2(0.6, 0.8) # Attributes work exactly like Python attributes a.x = 10.0 print(a)
py::arg() declares keyword arguments with default values, making the Python API feel native. py::self enables operator overloading using pybind11’s built-in operator helpers. The entire C++ struct is accessible as if it were a Python dataclass.
Example 4: pybind11 — Zero-Copy Numpy Array Processing
One of the most common real-world use cases: accept a Numpy array in Python, process it at C++ speed with direct memory access and no data copying, and return the result as another Numpy array.
// image_ops.cpp
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <cmath>
namespace py = pybind11;
// Operate directly on Numpy memory — zero copies
py::array_t<float> gaussian_blur_1d(
py::array_t<float> input, float sigma)
{
auto in = input.unchecked<1>();
auto out = py::array_t<float>(input.size());
auto o = out.mutable_unchecked<1>();
ssize_t N = in.shape(0);
int radius = static_cast<int>(3 * sigma);
for (ssize_t i = 0; i < N; i++) {
float sum = 0.0f, weight = 0.0f;
for (int k = -radius; k <= radius; k++) {
ssize_t j = i + k;
if (j < 0 || j >= N) continue;
float w = std::exp(-0.5f * k*k / (sigma*sigma));
sum += in(j) * w;
weight += w;
}
o(i) = sum / weight;
}
return out;
}
PYBIND11_MODULE(image_ops, m) {
m.doc() = "Image processing ops backed by C++";
m.def("gaussian_blur_1d", &gaussian_blur_1d,
py::arg("input"), py::arg("sigma") = 1.0f,
"Apply 1D Gaussian blur to a float32 numpy array");
}
The setup.py, this time including NumPy’s include path:
from setuptools import setup, Extension
import pybind11
import numpy
ext = Extension(
"image_ops",
sources=["image_ops.cpp"],
include_dirs=[pybind11.get_include(), numpy.get_include()],
language="c++",
extra_compile_args=["-std=c++14", "-O3"],
)
setup(name="image_ops", ext_modules=[ext])
Using it from Python:
import numpy as np
import time
from image_ops import gaussian_blur_1d
signal = np.random.randn(100_000).astype(np.float32)
t0 = time.perf_counter()
result = gaussian_blur_1d(signal, sigma=2.0)
t1 = time.perf_counter()
print(f"C++ blur: {(t1-t0)*1000:.2f}ms")
print(result.dtype) # float32
print(result.shape) # (100000,)
py::array_t<T> gives you a direct pointer to the underlying Numpy memory buffer. unchecked<N>() gives maximum performance by skipping bounds checking. mutable_unchecked<N>() is used for output arrays where you need write access. Because we operate on the raw buffer, no data is ever copied between Python and C++.
Example 5: pybind11 — Class Inheritance and Python Callbacks
This is the most advanced pybind11 pattern: a C++ abstract base class that Python code can subclass and override, combined with C++ calling back into Python callables. This pattern is the foundation of plugin systems, rendering engines, and event-driven architectures.
// engine.cpp
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <functional>
#include <string>
#include <vector>
#include <memory>
namespace py = pybind11;
// Abstract C++ base class
class Processor {
public:
virtual ~Processor() = default;
virtual std::string process(const std::string &input) = 0;
virtual std::string name() const { return "Processor"; }
// Template method — calls the virtual process()
std::string run(const std::string &input) {
return "[" + name() + "] " + process(input);
}
};
// Trampoline class — required so Python can override virtual methods
class PyProcessor : public Processor {
public:
using Processor::Processor;
std::string process(const std::string &input) override {
PYBIND11_OVERRIDE_PURE(std::string, Processor, process, input);
}
std::string name() const override {
PYBIND11_OVERRIDE(std::string, Processor, name);
}
};
// Pipeline holds multiple processors and a Python callback
class Pipeline {
std::vector<std::shared_ptr<Processor>> steps;
std::function<void(const std::string&)> on_step;
public:
void add(std::shared_ptr<Processor> p) { steps.push_back(p); }
void set_callback(std::function<void(const std::string&)> cb) {
on_step = cb;
}
std::string execute(const std::string &input) {
std::string current = input;
for (auto &step : steps) {
current = step->run(current);
if (on_step) on_step(current); // call into Python
}
return current;
}
};
PYBIND11_MODULE(engine, m) {
py::class_<Processor, PyProcessor, std::shared_ptr<Processor>>(m, "Processor")
.def(py::init<>())
.def("process", &Processor::process)
.def("name", &Processor::name)
.def("run", &Processor::run);
py::class_<Pipeline>(m, "Pipeline")
.def(py::init<>())
.def("add", &Pipeline::add)
.def("set_callback", &Pipeline::set_callback)
.def("execute", &Pipeline::execute);
}
Using it from Python — including subclassing the C++ abstract class:
from engine import Processor, Pipeline
# Subclass the C++ abstract base class directly in Python
class UpperProcessor(Processor):
def process(self, text: str) -> str:
return text.upper()
def name(self) -> str:
return "UpperCase"
class ExclamProcessor(Processor):
def process(self, text: str) -> str:
return text + "!!!"
def name(self) -> str:
return "Exclaim"
pipeline = Pipeline()
pipeline.add(UpperProcessor())
pipeline.add(ExclamProcessor())
# Pass a Python lambda as a C++ std::function callback
log = []
pipeline.set_callback(lambda s: log.append(s))
result = pipeline.execute("hello world")
print(result) # [Exclaim] [UpperCase] HELLO WORLD!!!
print(log) # intermediate results captured by the callback
The trampoline class (PyProcessor) is what makes Python subclassing of C++ virtual classes work. It intercepts virtual calls and routes them through pybind11’s override machinery. PYBIND11_OVERRIDE_PURE raises a Python NotImplementedError if the subclass does not implement the method. pybind11/functional.h makes std::function interoperable with Python callables including lambdas and regular functions.
Example 6: Cython — Accelerating Pure Python Code
Cython’s most accessible use case requires no C code at all. You take a slow Python loop, add static type declarations, and Cython compiles it to C that runs dramatically faster. Here we compare a pure Python prime sieve with the Cython equivalent.
The Cython source file (primes.pyx):
# primes.pyx
# cython: boundscheck=False, wraparound=False
def primes_python(int n):
"""Reference implementation — pure Python logic."""
result = []
for i in range(2, n):
is_prime = True
for j in range(2, i):
if i % j == 0:
is_prime = False
break
if is_prime:
result.append(i)
return result
def primes_cython(int n):
"""Cython version — static types enable C-speed loops."""
cdef int i, j
cdef bint is_prime
result = []
for i in range(2, n):
is_prime = True
for j in range(2, i):
if i % j == 0:
is_prime = False
break
if is_prime:
result.append(i)
return result
The setup.py to compile it:
from setuptools import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("primes.pyx", compiler_directives={
"language_level": "3"
}))
Benchmark from Python:
# pip install cython
# python setup.py build_ext --inplace
import time
from primes import primes_python, primes_cython
N = 5000
t0 = time.perf_counter()
r1 = primes_python(N)
t1 = time.perf_counter()
r2 = primes_cython(N)
t2 = time.perf_counter()
print(f"Python: {(t1-t0)*1000:.1f}ms")
print(f"Cython: {(t2-t1)*1000:.1f}ms")
print(f"Speedup: {(t1-t0)/(t2-t1):.1f}x")
# Typical result: 10 to 100x speedup from type declarations alone
cdef declares a C-typed variable. bint is Cython’s boolean integer type. The compiler directives boundscheck=False and wraparound=False disable Python’s list bounds checking and negative indexing support, squeezing out additional performance in tight loops. The logic of the function is identical to the pure Python version — the only change is the type annotations.
Summary and Recommendations
Here is a concise decision guide:
- Use
ctypeswhen you need to call a pre-compiled C library from Python immediately, with no extra packages and no build step. It is in the standard library and works everywhere. The cost is verbosity and the lack of type safety. - Use
cffiwhen you are wrapping a C library and want a cleaner API than ctypes, or when you needPyPycompatibility. Paste your header declarations and call dlopen. - Use
pybind11when you are wrapping C++ code. It is the recommended default for modern C++ projects. It handles virtually every C++ pattern — classes, templates, STL, Numpy, virtual dispatch, callbacks — and has the largest community and best documentation of any C++ binding tool. - Use
nanobindwhen you are starting a new C++ project that requires C++17 and you care about binary size or compile time. The API is close enough topybind11that switching is easy. - Use
Cythonwhen you want to speed up Python code without writing any C, or when you are wrapping a C or C++ API that requires complex Python-side data manipulation. It is the tool of choice for the scientific Python ecosystem. - Use
SWIGwhen you need to generate bindings for Python, Java, Go, Ruby, and other languages simultaneously from one interface definition. For Python-only projects it is almost always overkill.