Rules of thumb for Rust code

I really like the Rust programming language, but I realize that it's quite challenging to get started in. One thing I found helpful was to develop a few rules of thumb that I could follow in most cases. This helped reduce the cognitive load of designing my Rust programs by giving me good default behaviors to rely on. These are emphatically not strict rules; you not only should but need to break them in many cases. But, following these rules when you don't have a good reason not to should help your first pass at a new program be a good foundation to build on.

For this post, I'm assuming that you have learned enough Rust to know the basic terminology, such as a move vs. a reference, what a trait is, what lifetimes are and why they matter, and so on. That being said, my three rules of thumb are:

  1. Prefer immutable reference for (non-copy) function inputs
  2. Make your custom structs copy and clone
  3. Use owned types in structs as much as possible

Prefer immutable references for (non-copy) function inputs

What this means: When writing a function that takes an argument like String or Vec that follows move semantics, it's best to accept the inputs as references (or better, slices) whenever possible. If you were writing a function that took a vector of bytes and parsed that into a new data structure, you could write it several ways:

// using move semantics
fn parse_byte_stream_move(bytes: Vec<u8>) -> MyDataStruct {
    ...
}

// using a slice reference
fn parse_byte_stream_slice(bytes: &[u8]) -> MyDataStruct {
    ...
}

This rule of thumb says do the second. However, if you're taking something that has the Copy trait - primitive types like booleans, integers, floats, characters are the most common examples - don't ask for a reference, just take the regular value (so make the input type e.g. i64 not &i64). Copy types are never moved, so it's actually more flexible to pass the value, rather than a reference.

Why this helps: Immutable references are the most forgiving way of passing data. If you used a move (like the first example), then whenever you call this function elsewhere in your code, you give up whatever you pass into it. You wouldn't be able to use that Vec<u8> after the call to parse_byte_stream_move. If all this function needs to do is iterate through the vector and read the bytes, then there's no reason for it to take away your ability to use that vector after the function all. Having the input be a reference gives you the most flexibility when you use this function elsewhere in your code.

Vec and String inputs can be made more general by having the input be a slice, not just a reference. That means making the type be &[T] or &str rather than &Vec<T> or &String. Using slices means that you could always elect to pass part of a vector or string, rather than the whole thing.

If you have a function that accepts a &str and you get an compiler error about this not being the same type as a &String you're trying to pass in, take an explicit slice of the string, like this: &s[..]. The [..] is what explicitly slices the string.

When to ignore this rule: If a function needs to pass it along to another function that uses move semantics, then you should take ownership of the value (so use Vec<T> or String as the input type rather than &[T] or &str). It's better to do that rather than take a reference and convert it to an owned value inside the function. If you take ownership immediately, then wherever you reuse this function you can decide each time you call it whether to make a copy first. As an example, consider:

// Takes the raw data, and parses the first part of it to get some header information
// that describes the rest of the data. It converts that into a HeaderInfo struct, and
// returns the rest of the data as a new vector. Assume that we need to return the
// data part of the original vector as a smaller, owned vector because we can't be sure
// that if we returned a reference to part of the original data that the original data would 
// live long enough. 
fn split_header(bytes: Vec<u8>) -> (HeaderInfo, Vec<u8>) {
    ...
}


fn parse_byte_stream_slice(bytes: &[u8]) -> MyDataStruct {
    // We have no choice - we *must* pass in an owned vector, so
    // we convert the slice into one
    let (header, data) = split_header(bytes.to_owned())
    ...
}


fn load_binary_slice(data_file: &Path) -> MyDataStruct {
    let mut bytes: Vec<u8> = Vec::new();
    let mut f = std::fs::File::open(data_file).unwrap();
    f.read_to_end(&mut bytes).unwrap();

    // Passing a reference has no benefit here - the original bytes will just
    // be dropped when the function returns, and parse_byte_stream_slice ends
    // up creating an extra copy of the bytes for no reason.
    return parse_byte_stream_slice(&bytes);
}


fn parse_byte_stream_move(bytes: Vec<u8>) -> MyDataStruct {
    // Here, no need to make an owned copy - we already own bytes
    let (header, data) = split_header(bytes);
}


fn load_binary_move(data_file: &Path) -> MyDataStruct {
    let mut bytes: Vec<u8> = Vec::new();
    let mut f = std::fs::File::open(data_file).unwrap();
    f.read_to_end(&mut bytes).unwrap();

    // Since we don't need bytes after this function call, letting the parse function
    // take ownership saves making a copy. If we ever *did* need to use the bytes past
    // the parse function, we could use bytes.clone() to make a second instance to pass in.
    return parse_byte_stream_slice(bytes);
}

Another time to ignore this rule is if you need to modify the input. In that case either use a mutable reference or take ownership and return the modified version.

Make your custom structs copy and clone

What this means: Whenever you define a custom struct, add #[derive(Clone, Copy)] or #[derive(Clone)] to it:

#[derive(Copy, Clone)]
struct Point3d {
    x: f64,
    y: f64,
    z: f64
}

#[derive(Clone)]
struct Player {
    name: String,
    position: Point3d
}

You can derive Copy on any struct whose fields all themselves derive Copy. Point3d can be Copy here because the primitive f64 type is Copy. Likewise you can derive Clone on any struct whose fields all themselves derive Clone. Player can't be Copy because a String doesn't implement Copy, but both String and our Point3d struct do implement Clone.

Why this helps: Deriving Copy on a struct means that any time you assign it to another variable, either as x = y or by passing it to a function, the new variable gets a copy of the original, rather than moving ownership to the new variable. This means you don't have to worry about ownership or lifetimes, which simplifies things a great deal.

Likewise, deriving Clone means that we can get a new instance of our struct by calling .clone() on it. If you ever run into a case where you need to pass a variable into a function that will take ownership, but you need to use the variable after that, you can get around this by cloning it. It can also help if you need to get a mutable copy of something that has immutable references.

When to ignore this rule: You don't want to make big structs that take a lot of bytes Copy, because then every time you pass one to a function, you're making a duplicate of all those bytes. Of course, if you have a struct that consists of enough data that making it copy would be inefficient, it is probably either:

  1. a struct that contains a string, vector, N-d array, or other type that can't be Copy because it has data on the heap, in which case your struct can't be Copy either,
  2. a struct with a ridiculous number of fields (to make a struct take 1 kB of memory would need 128 f64 fields), in which case you're already going to be miserable populating all those fields, or
  3. a deeply nested struct (on with other structs as fields, which themselves have structs as fields, and so on), in which case again you're going to be miserable referencing a field five or ten levels deep.

So in practice, if you're running into situations where its a bad idea to make a struct which can derive Copy do so, you have other design challenges.

For Clone, there's very few types which don't derive Clone. Since deriving it only provides a .clone() method (rather than changing the default behavior like Copy), there's little reason not to derive Clone whenever you can. The most likely case where you can't is if your struct includes a mutable reference, as those aren't clonable.

Use owned types in structs as much as possible

What this means: When defining a custom struct, default to having its fields be owned types, like String or Vec, instead of references like &str or &[T]:

// prefer this
struct Guest {
    name: String,
    dinner_choice: Dinner // assume Dinner is some enum of the allowed choices
}

// over this
struct RefGuest<'a> {
    name: &'a str,
    dinner_choice: &'a Dinner
}

Why this helps: First, avoiding references as fields in structs saves you from having to deal with specifying named lifetimes (the 'a in the RefGuest example). These aren't too hard to use once you understand them, but they do add some complexity to your code.

Second, when we are defining a custom struct, it's because we need a container for multiple pieces of related data. The majority of the time, that means that it makes sense for the struct to own that data, and for other parts of the code to borrow from the struct (or borrow the whole struct), rather than the other way around.

When to ignore this rule: Any time that (a) more than one struct needs to contain the same piece of data, (b) when the struct is intended to be short lived, or (c) when it really does make sense to have the data stored someplace central in the code. This is definitely the most situational rule. Consider if our Guest needs to track which other guests they are family with (so that they sit together when we make a seating chart for example). Then you might have:

struct Guest<'a> {
    name: String,
    dinner_choice: Dinner,
    family_members: Vec<&'a Guest<'a>>
}

We can't have a Guest take ownership of other Guests. If John and Jane are family members, then John's instance needs to include Jane in the family_members field and vice versa. Doing that with family_members being a Vec<Guest> will cause all kinds of circularity problems.

Bonus tip

If we think a little further about this guest struct, we'll see another problem. Consider how we might instantiate a list of guests for our event:

// has to be mut so that we can add Jane later
let mut john = Guest{name: "John Doe".to_owned(), dinner_choice: Dinner::Salmon, family_members: Vec::new()};
// we can define Jane as immutable, since we can add John right away
let jane = Guest{name: "Jane Doe".to_owned(), dinner_choice: Dinner::Steak, family_members: vec![&john]};

// error - cannot modify john while there is an immutable reference to him as jane's family member!
john.family_members.push(&jane);

This is a problem with having long-lived immutable references to objects - if we ever need to modify them, we can't. One way to address this is to give each instance a unique ID, and store that, rather than a reference. This would look like:

struct Guest {
    uid: u64,
    name: String,
    dinner_choice: Dinner,
    family_member_uids: Vec<u64>
}

Storing IDs like this saves us from the mutable/immutable reference problem, since u64 values are copyable. It would require extra logic to map IDs to Guest instances, but the added flexibility can be well worth it.