Unsafe Rust

Dealing with Unsafe Rust

I’ve been writing a lot of safe Rust code so far, but sometimes we need to dive into unsafe territory to achieve higher performance.

I’m quite familiar with GPU programming, and modifying a vector in place with GPU threads is a very common operation there. The basic idea is to (very roughly): assign a thread for each item in the vector, and let each thread modify that item. Since there will be a single thread responsible for a single item, we don’t need locks.

In place means, we will not copy the vector, we will change the original.

Let’s try to see how we can achieve the same in Rust…

Rust Challenge: Modify vector in place without locks

Astute readers may think of potential problems beforehand. Let me list them:

  1. the vector that we are going to modify in place, must be mutable.
  2. In Rust, borrow checker only allows one mutable reference at a time. So we cannot simply pass the vector to multiple threads.
  3. Neither passing this vector to threads with mutable reference (&mut) will work, nor moving (move) the vector into multiple threads… Borrow checker will deny both approaches.
  4. We can send this vector to threads if we wrap our vector inside Arc<Mutex<Vec<T>>> which will give us shared ownership and mutex lock. But this comes with performance overhead. And remember, we don’t want locks.
  5. So the only option left is to dive into unsafe Rust!

Naive trial

use std::thread;

fn modify_vector_chunk(thread_id: usize, vec_ptr: *mut i32) {
    // SAFETY: each thread corresponds to a single element of the vector
    // thus, there won't be a data-race
    unsafe {
        let ptr = vec_ptr.add(thread_id);
        *ptr *= 2;
    }
}

fn main() {
    // Create the vector and fill it with initial values
    let mut my_vector = vec![1, 2, 3];

    // Print the initial vector
    println!("Initial vector: {:?}", my_vector);

    // Calculate the chunk size for each thread

    // Split the vector into chunks and spawn threads
    let mut thread_handles = vec![];
    let vec_ptr = my_vector.as_mut_ptr();

    for thread_id in 0..my_vector.len() {
        let handle = thread::spawn(move || {
            modify_vector_chunk(thread_id, vec_ptr);
            // no need to clone `vec_ptr` since it's `Copy`
        });
        thread_handles.push(handle);
    }

    // Wait for all threads to finish
    for handle in thread_handles {
        handle.join().unwrap();
    }

    // Print the modified vector
    println!("Modified vector: {:?}", my_vector);
}

This makes the compiler angry, due to:

`*mut i32` cannot be sent between threads safely
within `[closure@src/main.rs:74:36: 74:43]`, the trait `Send` is not implemented for `*mut i32`

Ah… Yes. If we want to send something to threads, it must implement Send trait. And raw pointers don’t implement Send by default.

That’s annoying. Because we cannot implement Send for types that are not defined by us. Due to orphan rule or coherence if you will.

Quoting from the Rust book:

But we can’t implement external traits on external types. For example, we can’t implement the Display trait on Vec<T> within our aggregator crate, because Display and Vec<T> are both defined in the standard library and aren’t local to our aggregator crate. This restriction is part of a property called coherence, and more specifically the orphan rule, so named because the parent type is not present. This rule ensures that other people’s code can’t break your code and vice versa. Without the rule, two crates could implement the same trait for the same type, and Rust wouldn’t know which implementation to use.

Implementing Send for raw pointers

One way to implement an external trait for an external type is, to wrap the external type into a custom type, that is ours. So, we will be implementing an external trait for an internal type, which is allowed :)

use std::thread;

struct RawPointerWrapper {
    raw: *mut i32,
}

unsafe impl Send for RawPointerWrapper {}

fn modify_vector_chunk(thread_id: usize, vec_ptr: RawPointerWrapper) {
    // SAFETY: each thread corresponds to a single element of the vector
    // thus, there won't be a data-race
    unsafe {
        let ptr = vec_ptr.raw.add(thread_id);
        *ptr *= 2;
    }
}

fn main() {
    // Create the vector and fill it with initial values
    let mut my_vector = vec![1, 2, 3];

    // Print the initial vector
    println!("Initial vector: {:?}", my_vector);

    // Calculate the chunk size for each thread

    // Split the vector into chunks and spawn threads
    let mut thread_handles = vec![];

    for thread_id in 0..my_vector.len() {
        let raw_pointer_struct = RawPointerWrapper {
            raw: my_vector.as_mut_ptr(),
        };
        let handle = thread::spawn(move || {
            modify_vector_chunk(thread_id, raw_pointer_struct);
        });
        thread_handles.push(handle);
    }

    // Wait for all threads to finish
    for handle in thread_handles {
        handle.join().unwrap();
    }

    // Print the modified vector
    println!("Modified vector: {:?}", my_vector);
}

And that’s simply it! Let’s rehearse what we did:

  1. We planned ahead on what compiler might be angry about.
    1. having multiple mutable references to a vector would be problematic
    2. we didn’t want locks
    3. our only option was to use raw pointers (unsafe)
  2. we used unsafe to dereference raw pointers, so that we could get around locks, and get multiple mutable handles to our vector.
  3. we wrapped our raw pointers with a custom struct, to be able to implement unsafe impl Send for our wrapper struct, allowing it to be sent to threads.
  4. Voila!