Adding Randomness to Test Data

While tests can help improve the robustness of our applications, there is a limitation inherent to example-based testing - the sample data we use as examples in our tests. In this post, we will cover reducing some of these limitations by adding randomness to our tests. The examples in the post will be using FactoryBot, but similar concepts are applicable with other testing tools such as Rails' fixtures.

Sequences

Inline

A commonly used form of randomness in FactoryBot is sequences. When defining a factory, you can use the sequence method to have a block of code run when generating an attribute (as opposed to a hard-coded value). These are known as inline sequences.

You may already be using inline sequences to avoid triggering uniqueness constraints. For example, if a User record requires a unique username you can use a sequence to append an ever-increasing number to the end of the generated username.

factory :user
  usename { "user-#{n}" }
end

Now, anytime we create a user, we will have a unique username attribute:

3.times { puts FactoryBot.build(:user) }

=> <User @username="user-1">
=> <User @username="user-2">
=> <User @username="user-3">

Global

FactoryBot also supports global sequences. These are named sequences that you can reference across factories, providing globally unique, valid data between models and attributes. An example of this may be generating sample email addresses. However, global sequences can generate any type of data. For example, at work, we have a simple string sequence that returns the string, "string," with a number appended.

sequence :string do |n|
  "string #{n}"
end

factory :user do
  first_name { generate(:string) }
  last_name { generate(:string) }
end

=> <User @first_name="string 1", @last_name="string 2">
=> <User @first_name="string 3", @last_name="string 4">
=> <User @first_name="string 5", @last_name="string 6">

While this does not provide realistic data, it does add randomness to the objects you are creating. It also signals to the team that we do not care about the value of an attribute and simply want a string value.

Randomness

Sequences provide an easy way to introduce randomness into your system. Every time your tests run in a different order, your factory has a chance of generating a value at a different point in the sequence, resulting in slightly different attributes. This small amount of variation provides an easy starting point for introducing randomness into your system.

While easy to get started with, sequences are limited. In the next section, we will explore introducing more variation in our values.

Sampling Options

Another form of randomness I like to use comes into play for attributes with a set number of valid values (i.e., enums. Rather than having your factory always set the attribute with the same value, we can use a random option when generating a new instance.

Let's imagine a class representing a pizza, and the pizza can have toppings. Our restaurant only has a set number of toppings, so we have a constant that lists the available choices.

class Pizza
  AVAILABLE_TOPPINGS = %w[
    bacon
    jalapenos
    onions
    pepperoni
    peppers
    pineapple
  ]

  attr_accessor :toppings

  def initalize(toppings)
    @toppings = Array(toppings)
  end
end

In our factory, we can choose a random entry from our list of available toppings using Ruby's Array#sample method.

FactoryBot.define do
  factory :pizza do
    toppings { Pizza::AVAILABLE_TOPPINGS.sample }
  end
end

This results in a different topping every time we use our factory to create a new pizza.

3.times { puts FactoryBot.build(:pizza).inspect }

=> <Pizza @toppings=["jalapenos"]>
=> <Pizza @toppings=["onions"]>
=> <Pizza @toppings=["pineapple"]>

Random Amount of Random

Often, when dealing with enum-like data, your attribute will be a single-value. In these cases, Array#sample is all you will need. However, in our pizza example, we can have no toppings, one topping, or many toppings. A limitation of our current factory is that it will always generate a Pizza with a single topping.

Herein lies an issue with example-based testing - it relies on the examples you remember to include. We may not think to test zero, one, and many toppings, and, even if we do, we may only do it in a single scenario. What if, instead, the factory always returned a random number of random toppings? That would add some variability to all tests that use this factory.

We can update the toppings attribute in our factory to build a random number of pizza toppings - some number between no toppings and the total number of available toppings.

FactoryBot.define do
  factory :pizza do
    toppings do
      Array.new(rand(0..Pizza::AVAILABLE_TOPPINGS.size)) do
        Pizza::AVAILABLE_TOPPINGS.sample
      end
    end
  end
end

Array.new(rand(0..Pizza::AVAILABLE_TOPPINGS.size)) will generate a random number between zero and the number of available toppings (rand(0..Pizza::AVAILABLE_TOPPINGS.size)) and create an Array of that size. We then leverage Array#new's block argument to run some code to generate each entry in our new array. For generating entries, we will use the same code we had before, Pizza::AVAILABLE_TOPPINGS.sample.

Now, when we generate our Pizzas, we will have a different number of different toppings each time.

3.times { puts FactoryBot.build(:pizza).inspect }

<Pizza @toppings=["peppers", "bacon"]>
<Pizza @toppings=["peppers"]>
<Pizza @toppings=["pepperoni", "pineapaple", "bacon", "onions"]>

Valid Amount of Random

Something to be careful of when dealing with randomness is that you could end up with invalid data. As shown below, our current factory doesn't do anything to protect against having multiple entries of the same topping.

3.times { puts FactoryBot.build(:pizza).inspect }

<Pizza @toppings=["peppers", "peppers", "bacon", "peppers", "peppers", "onions"]>
<Pizza @toppings=["peppers", "peppers"]>
<Pizza @toppings=["onions", "bacon", "peppers", "jalapenos", "pineapaple"]>

Depending on your model, this may or may not matter (maybe people can order a quadruple serving of peppers if they want). Be warned - if you're not careful, adding randomness to your tests can result in invalid objects. Sometimes, these invalid objects can help you catch bugs and improve your validations. Unfortunately, it may also be the case that many forms of invalid data do not result in a test failure. In these cases, you will have passing tests with invalid data, not an ideal combination.

For our pizza example, let's say that we want our factory to result in a unique list of toppings. To do this, we can update how we create the toppings list in our factory to leverage a Set.

FactoryBot.define do
  factory :pizza do
    toppings do
      # Set will make sure we have unique list
      topping_set = Set.new
      topping_count = rand(0..Pizza::AVAILABLE_TOPPINGS.size)

      while topping_set.size < topping_count
        topping_set << Pizza::AVAILABLE_TOPPINGS.sample
      end

      topping_set.to_a
    end
  end
end
3.times { puts FactoryBot.build(:pizza).inspect }

<Pizza @toppings=["onions", "peppers", "pineapaple", "bacon"]>
<Pizza @toppings=["pineapaple", "peppers", "bacon", "pepperoni", "onions", "jalapenos"]>
<Pizza @toppings=[]>

Delegating Randomness

The final suggestion for introducing randomness into your system relies on a gem, Faker. Faker provides hundred of modules themed around different topics on which you can get fake data.

With Faker, you can get names, email addresses, physical addresses, quotes, and more.

Faker::Name.name
=> "Marquerite Legros CPA"

Faker::Internet.email
=> "lawana_lesch@donnelly.com"

Faker::TvShows::TheITCrowd.quote
=> "Well, I'm the boss... Head Honcho. El Numero Uno. Mr. Big. The Godfather. Lord of the Rings. The Bourne... Identity. Er... Taxi Driver. Jaws. I forgot the question quite a while back. Who are you, again?"

Faker::TvShows::SiliconValley.url
=> "http://raviga.com"

Faker provides an advantage over our sequences example in that you aren't using the same basic template. Instead, you are pulling from a large list of possible values. Even better, most basic data types can generate fairly complex sample data. For example, looking at how names can be generated, there are prefixes, suffixes, and middle names; these additional variations could help potentially catch issues if you were not handling them.

Like all of our options so far, Faker does not go as far as something like a property-based testing tool would provide. Its goal is to provide realistic data, so you will not have test data that looks like someone rolled their head on the keyboard. That said, its realistic data will likely provide more variation and randomness than you would have if left to your own devices.

Conclusion

Adding some randomness into your tests can potentially help catch bugs. By creating scenarios you may not have been thinking of when designing a solution, random data can reveal tests that pass with one set of data and fail with another. While there are different levels of randomness and chances of catching corner cases in your system, some of the options covered in this post are low-cost enough that I use them by default.

I hope you can adopt the practices in this post to shine a light on some of the bugs in your applications.


Notice something wrong? Please consider proposing an edit or opening an issue.