Exposing FFI from the Rust library

Wikipedia determines FFI as a mechanism by which a program written in one programming language can call routines or make use of services written in another.

FFI can be used for speeding up program execution (which is common in dynamic languages like Python or Ruby), or just because you want to use some library written in another language (for example TensorFlow has the core library written in C++ and exposes a C API which other TensorFlow libraries are using).

Writing an FFI for Rust library is not very hard, but it has some challenges and scary parts — mostly because you are going to work with pointers and unsafe blocks1.
What happens there might be out of Rust’ safe memory model, or in other words, compiler will not be able to check if everything is all right, so memory management and safety guarantees are up to developer.

In this post I’m going to describe my experience with Rust and FFI based on the battery-ffi crate which is exposing FFI bindings for another crate of mine — battery.
What I wanted to do is to provide a C interface for creating Rust-specific structs and to be able to fetch data from them.

First things first

You need to add libc, which provides all the definitions necessary to easily interoperate with C-like code, into a crate dependencies and set crate-type to cdylib2, which will build the dynamic library (.so, .dylib or .dll file, depending on your target OS type).

[dependencies]
libc = "*"

[lib]
crate-type = ["cdylib"]

It might be a good idea to separate FFI layer from the “main” library and move the unsafe code into a new crate, similar to community convention for *-sys3 crates, but vice versa this time.
In addition, by default Rust libraries are using crate-type = ["rlib"], while FFI crate should be a cdylib.
There is no common convention on how to name them, but such kind of crates has the -ffi or -capi suffix usually.

FFI syntax

Here is function example, that returns a battery percentage (in 0.0…100.0 % range) from the Battery struct:

#[no_mangle]
pub unsafe extern fn battery_get_percentage(ptr: *const Battery) -> libc::c_float {
    unimplemented!()  // Example below will contain the full function
}

Its declaration starts with a #[no_mangle] attribute, which disables name mangling; in a few words, it allows other languages to find declared functions in the resulting library by the expected name (battery_get_percentage in our case) and not by some compiler-generated name like _ZN7battery_get_percentage17h5179a29d7b114f74E, because who would like to use the functions named like that?

Then we have a function definition with two additional keywords unsafe and extern.

The extern keyword makes the function adhere to the C calling convention, you can check the Wikipedia page to understand why is this needed and you can find all available calling conventions in the Rust Nomicon.

You probably saw the unsafe keyword before already when it is used as a block mark (as in unsafe { .. do something scary here .. }), but the whole function is marked as unsafe here because the undefined behavior will result from incorrect usage — for example, by passing a NULL or dangling pointer, so it is up to caller to use it properly and be aware of the possible consequences.

Returning parameters

In my case I want to expose a few Rust structs to the outer world, but due to the implementation they might contain some complex stuff in them and it would be a bad idea to force end users deal with that; for example, what if my Manager struct contains Mutex in it — how should it be represented in C or Python?4
That’s why I’m going to hide structs’ implementations behind the opaque pointers — basically I’ll return a pointer to some chunk of memory at the heap and will provide functions to fetch needed data from that pointer.
Heap allocation here is mandatory, because otherwise if you allocate your data on the stack (which Rust does by default except for Vec, HashMap and a few other things), this data will be freed when the function ends and you will not be able to properly return it, so Box is your best friend in that case.

In most cases, you don’t need to wrap primitive types such as u8 or i32 into a Box unless you want to allocate them on the heap — it is totally okay to return them as is. The Rust FFI Omnibus and Rust FFI Guide both providing multiple examples of how to do that.

Now let’s take a look at this function:

#[no_mangle]
pub extern fn battery_manager_new() -> *mut Manager {
    let manager: Manager = Manager::new();
    let boxed: Box<Manager> = Box::new(manager);

    Box::into_raw(boxed)
}

As you can see, it creates a Manager struct, Box::new moves it to the heap and then returns the raw pointer to the location on the heap where it is stored. Note that this function is not marked as unsafe, because it’s not possible to create some undefined behavior here.

Passing parameters

This one function accepts a pointer to the previously created Manager struct and calls Manager::iter method on it, creating the Batteries struct:

#[no_mangle]
pub unsafe extern fn battery_manager_iter(ptr: *mut Manager) -> *mut Batteries {
    assert!(!ptr.is_null());
    let manager = &*ptr;

    Box::into_raw(Box::new(manager.iter()))
}

The first thing we should do is to ensure that passed pointer is not NULL:

assert!(!ptr.is_null());

You really should do it each time for each passed pointer, because your input is not safe and you should not always expect a valid data, so it is better to panic earlier expectedly than do some undefined behavior.
After that we are creating a reference to our struct from this pointer:

let manager = &*ptr;

This line is inferencing all types, so here is a long version, if &* part looks weird for you (this version will not compile5, but it is easier to understand what happens that way):

let manager_struct: Manager = *ptr;
let manager: &Manager = &manager_struct;

Here we dereferencing ptr and immediately re-referencing it again, resulting with a reference to our struct.

So, in my case Manager::iter method returns Batteries iterator, which I want to expose too, therefore I’m doing the same as in battery_manager_new function:

Box::into_raw(Box::new(manager.iter()))

Freeing it

After the Box::into_raw call Rust forgets about this variable, so it is our responsibility now to free its memory manually later or deal with memory leaks. Lucky for us it is quite simple:

#[no_mangle]
pub unsafe extern fn battery_manager_free(ptr: *mut Manager) {
    if ptr.is_null() {
        return;
    }

    Box::from_raw(ptr);
}

As nifty addition, we silently ignore passed NULL pointers because we can’t do anything with them and because it is a bad idea to call Box::from_raw twice on the same pointer — it might lead to the double-free behavior.

In the previous example we had converted struct into a raw pointer with Box::into_raw help and now we are converting it back into a struct. What happens next is a usual Rust “magic” — now that the pointer is owned by Box<T> and controlled by safe Rust, it will be automatically dropped at the function end, properly calling destructors and freeing memory.

Same thing should be done with *mut Batteries pointer which was created a few examples above.

Creating the getters

Primary goal for my battery crate is to provide various information about batteries (as in your notebook), therefore now I need to create multiple “getter” functions that will fetch data from the previously created *const Battery pointer (there is no example for it, but this is the struct very similar to another in the code snippets above).

Following example should be easy understable for you now, we are receiving raw pointer, validating it and taking a reference to the Battery struct:

#[no_mangle]
pub unsafe extern fn battery_get_energy(ptr: *const Battery) -> libc::uint32_t {
    assert!(!ptr.is_null());
    let battery = &*ptr;

    battery.energy()
}

After taking the reference I just simply return an u32 from the Battery::energy method, and since it is the same type as libc::uint32_t it will be passed to the caller as is.

You may have noticed the small difference from the previous examples: instead of accepting *mut this function receives *const pointer.

This or this post will help you to understand the difference, and here is short summary by matklad:

If you are using raw pointers for FFI (as parameter and return types of extern “C” functions), then *const vs *mut is purely a question of documenting intent, and does not affect the generated code at all. However, documenting intent is important, because C and C++ have a rule that you can’t mutate a constant object.

Since I’m not going to mutate the battery state here I prefer to use *const notation, precisely describing my intentions with that argument.

Handling optional results

Some Battery struct methods are returning Option<T> types, which are can’t be mapped to a C ABI as is and their T values can’t be returned as NULL since they are not pointers but a primitive types such as f32, for example.

There are three widely adopted ways exists intended to solve that problem:

  1. return some impossible value (such as -1 result commonly used in C)
  2. create a thread-local variable (usually called errno) and check it each time after receiving an “optional” argument
  3. or create the struct similar to the following code, return it and check if present == true
 #[repr(C)]
 struct COption<T> {
    value: T,
    present: bool
}

Rust FFI Guide has the comprehensive description of the second method; I had chosen the first one for now, preferring to return values that are not impossible in real life (your notebook battery can’t be 340282350000000000000000000000000000000 °C, right?).

Handling string results

C strings and Rust strings are two very different types and you can’t just cast one into another, official documentation provides the big list of differences between them. Fortunately, in my case I don’t need to receive incoming strings, yet I need to output them.
It can be done very similar to previous examples, where we used Box‘ed values.

Since C strings are basically pointers to some chunk of heap memory ending with a nul byte (in case of char* type), we will need to allocate some memory on the heap and put our UTF-8 string6 in there. Rust provides a CString type which is exactly what we need - it represents C-compatible string allocated on the heap.

In the following example battery.serial_number() returns the Option<&str>, which we are converting into a CString later and as the same as in the examples before converting into a raw pointer which is returned to the caller then.
If battery.serial_number() returned None, we are returning a NULL pointer, marking our result as non-existing.

Since we had allocated memory on the heap again, we are required to manage it manually and free after usage, it can be done almost the same as before:

#[no_mangle]
pub unsafe extern fn battery_get_serial_number(ptr: *const Battery) -> *mut libc::c_char {
    assert!(!ptr.is_null());
    let battery = &*ptr;

    match battery.serial_number() {
        Some(sn) => {
            let c_str = CString::new(*sn).unwrap();
            c_str.into_raw()
        },
        None => ptr::null_mut(),
    }
}

#[no_mangle]
pub unsafe extern fn battery_str_free(ptr: *mut libc::c_char) {
    if ptr.is_null() {
        return;
    }

    CString::from_raw(ptr);
}

In the opposite case when you need to receive strings from the C, it is critical to remember that C strings not only can be in other than UTF-8 encodings, but also might has the different character sizes, so it is really a big deal and will be skipped in this post.

Bindings generation

After building you will have the library file, which you can publish or send to client programmers and make them a little happier. Except they will need to rewrite your exported definitions with their languages once again, like Python’s ctypes requires:

import ctypes

class Manager(ctypes.Structure):
    pass

lib = ctypes.cdll.LoadLibrary('libmy_lib_ffi.so'))

lib.battery_manager_new.argtypes = None
lib.battery_manager_new.restype = ctypes.POINTER(Manager)
lib.battery_manager_free.argtypes = (ctypes.POINTER(Manager), )
lib.battery_manager_free.restype = None

Lucky us again, binding generators are there for us and those tools can parse C header files and output generated code in the required language.
With help of cbindgen crate we can automatically generate .h file with our FFI interface information and throw it later into a binding generator.

Embedding cbindgen is quite easy, first of all we need to add it as a build dependency into Cargo.toml:

[build-dependencies]
cbindgen = "0.8.0"

Now we need cbindgen.toml file next to Cargo.toml:

include_guard = "my_lib_ffi_h"
autogen_warning = "/* Warning, this file is autogenerated by cbindgen. Don't modify this manually. */"
language = "C"

And the build script:

use std::env;
use std::path::PathBuf;

fn main() {
    let crate_dir = env::var("CARGO_MANIFEST_DIR")
        .expect("CARGO_MANIFEST_DIR env var is not defined");
    let out_dir = PathBuf::from(env::var("OUT_DIR")
        .expect("OUT_DIR env var is not defined"));

    let config = cbindgen::Config::from_file("cbindgen.toml")
        .expect("Unable to find cbindgen.toml configuration file");

    cbindgen::generate_with_config(&crate_dir, config)
        .unwrap()
        .write_to_file(out_dir.join("my_lib_ffi.h"));
}

Now, after cargo build command rust’ OUT_DIR will contain my_lib_ffi.h with all needed information.

Additional note: I have found that this build script fails with some cryptic error while building documentation at docs.rs, so I had feature-gated cbindgen, added it as a default feature and then disabled default features for docs.rs build with a special section in Cargo.toml:

[package.metadata.docs.rs]
no-default-features = true

Probably it happens because OUT_DIR is not accessible for build script, so you may want to try to write output into another folder.

Postscriptum

This should be enough to start writing the FFI bindings for your crate and for additional information you may want to check the following links also:

I want to express my gratitude to Moongoodboy{K} and sebk from the #rust IRC channel for proof-reading and invaluable help.


  1. https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#calling-an-unsafe-function-or-method [return]
  2. https://doc.rust-lang.org/reference/linkage.html [return]
  3. https://www.matt-harrison.com/building-and-using-a-sys-crate-with-rust-lets-make-a-node-clone-well-kind-of/ [return]
  4. it can’t unless explicitly defined with #[repr(C)] [return]
  5. With &* you are dereferencing the raw pointer and taking a reference in one operation, but in a non-working example you can’t move out from raw pointer [return]
  6. of course C strings can be in any desired encoding, but usually it is assumed to be an UTF-8, otherwise it quickly become a mess [return]