EXPLAIN Query with Ecto

In a previous post, I gave a brief introduction to understanding the results of an EXPLAIN query (from postgres, at least). In this post, I cover generating an EXPLAINquery with Ecto.

Explaining

Fortunately, Ecto provides an explain function, Ecto.Adapters.SQL.explain/4.

The function is provided in the adapters module because not all database management systems support EXPLAIN capabilities. explain/4 will leverage a DBMS-specific adapter for generating the query:

Executes an EXPLAIN statement or similar for the given query according to its kind and the adapter in the given repository.

The first argument to Ecto.Adapters.SQL.explain/4 is a Repo module. There is also a convenience function, Repo.explain/3 which alleviates the need to pass in the Repo.

iex()> h Repo.explain/3

     def explain(operation, queryable, opts \\ [])

A convenience function for SQL-based repositories that executes
an EXPLAIN statement or similar depending on the adapter to obtain
statistics for the given query.

See Ecto.Adapters.SQL.explain/4 for more information.

Example

Set up

To demonstrate, we will create the schemas needed for a blog, Post and Comment, which are in a one-to-many relationship.

Post Schema

defmodule SampleBlog.Blog.Post do
  use Ecto.Schema
  import Ecto.Changeset

  alias SampleBlog.Blog.Comment

  schema "blog_posts" do
    field :title, :string

    has_many :comments, Comment

    timestamps()
  end

  @doc false
  def changeset(post, attrs) do
    post
    |> cast(attrs, [:title, :inserted_at])
    |> validate_required([:title])
  end
end

Comment Schema

defmodule SampleBlog.Blog.Comment do
  use Ecto.Schema
  import Ecto.Changeset

  alias SampleBlog.Blog.Post

  schema "blog_comments" do
    field :body, :string

    belongs_to :post, Post

    timestamps()
  end

  @doc false
  def changeset(comments, attrs) do
    comments
    |> cast(attrs, [:body, :post_id])
    |> validate_required([:body, :post_id])
  end
end

Our Query

The query we want to EXPLAIN will be fetching posts with recent activity. Our query will look something like this:

def posts_with_recent_activity() do
  recent =
    "Etc/UTC"
    |> DateTime.now!()
    |> DateTime.add(-60 * 60 * 24, :second)

  from(p in Post)
  |> join(:left, [p], c in assoc(p, :comments))
  |> preload([p, c], comments: c)
  |> where([p, c], p.inserted_at > ^recent or
                   c.inserted_at > ^recent)
  |> order_by([p], desc: p.inserted_at)
  |> Repo.all()
end

In this query, we join the posts with their comments. We also filter both tables based on the inserted_at column as our means of finding recent activity. Finally, we order the results by the posts' inserted_at column; this simply adds more details to our EXPLAIN result

EXPLAIN versus IO.inspect

A common debugging technique used in Elixir is to insert an IO.inspect in-between steps of a pipeline. Unfortunately, this does work for the explain function because it expects to receive a queryable (so it will not work at the end of the pipeline) and returns a string instead of what was passed in (so we break the pipeline or lose our results).

If we only care about the EXPLAIN results we could remove our call to Repo.all and pass our queryable to the explain function.

def posts_with_recent_activity() do
  recent =
    "Etc/UTC"
    |> DateTime.now!()
    |> DateTime.add(-60 * 60 * 24, :second)


  queryable =
    from(p in Post)
    |> join(:left, [p], c in assoc(p, :comments))
    |> preload([p, c], comments: c)
    |> where([p, c], p.inserted_at > ^recent or
                     c.inserted_at > ^recent)
    |> order_by([p], desc: p.inserted_at)
    # Note we no longer call Repo.all

  SampleBlog.Repo
  |> Ecto.Adapters.SQL.explain(:all, queryable)
  |> IO.puts()
end

Since the result of explain is a string containing the EXPLAIN results, I recommend IO.puts over IO.inspect because it will print the newlines and better format the result.

In practice, making this change can be a bit of a pain. By no longer returning the result of your query, you will break the consumers of this function. This breaking strategy works for a quick peek into what the database is doing. However, if you are tuning the query, you may want to run this multiple times. Returning the query results may be preferable in this case as it will allow you to validate the results continue to be correct.

Preserving the Pipeline

To preserve the pipeline and avoid copying and pasting my query, I turn to Kernel.tap/2. tap/2 will pipe its first argument as the argument to the function you pass in as the second argument. Rather than passing on the result of the function, tap/2 returns the original value; this allows us to pipe in the query, explain it, and then pass it, unchanged, to our query function. Let's see what this looks like:

def posts_with_recent_activity() do
  recent =
    "Etc/UTC"
    |> DateTime.now!()
    |> DateTime.add(-60 * 60 * 24, :second)


  from(p in Post)
  |> join(:left, [p], c in assoc(p, :comments))
  |> preload([p, c], comments: c)
  |> where([p, c], p.inserted_at > ^recent or
                   c.inserted_at > ^recent)
  |> order_by([p], desc: p.inserted_at)
  |> tap(fn query ->
    SampleBlog.Repo
    |> Ecto.Adapters.SQL.explain(:all, query)
    |> IO.puts()
  end)
  |> Repo.all()
end

Now, we will continue to receive the actual response from our query and have our EXPLAIN statement logged as well.

Blog.posts_with_recent_activity()
|> IO.inspect(label: "Result")
-- explain statement
Sort  (cost=56.17..57.39 rows=489 width=604)
  Sort Key: b0.inserted_at DESC
  ->  Hash Right Join  (cost=13.15..34.32 rows=489 width=604)
        Hash Cond: (b1.post_id = b0.id)
        Filter: ((b0.inserted_at > '2021-12-29 20:54:38'::timestamp without time zone) OR (b1.inserted_at > '2021-12-29 20:54:38'::timestamp without time zone))
        ->  Seq Scan on blog_comments b1  (cost=0.00..18.80 rows=880 width=64)
        ->  Hash  (cost=11.40..11.40 rows=140 width=540)
              ->  Seq Scan on blog_posts b0  (cost=0.00..11.40 rows=140 width=540)
# IO.inspect
Result: [
  %SampleBlog.Blog.Post{
    __meta__: #Ecto.Schema.Metadata<:loaded, "blog_posts">,
    comments: [
      %SampleBlog.Blog.Comment{
        __meta__: #Ecto.Schema.Metadata<:loaded, "blog_comments">,
        body: "First!",
        id: 31,
        inserted_at: ~N[2021-12-30 20:54:38],
        post: #Ecto.Association.NotLoaded<association :post is not loaded>,
        post_id: 36,
        updated_at: ~N[2021-12-30 20:54:38]
      }
    ],
    id: 36,
    inserted_at: ~N[2021-12-30 20:54:38],
    title: "My first post",
    updated_at: ~N[2021-12-30 20:54:38]
  }
]

The caveats of EXPLAIN aside, I have this to be a helpful tool when tuning a query. Thanks to the tap function, I can continue to confirm the results of my query are correct while printing out the query plan along the way.


Notice something wrong? Please consider proposing an edit or opening an issue.