Diffs

2026-01-02#tech

Let's conduct a fun thought experiment. Consider the following sentence:

The dog ran to the store.

Imagine that I told you I changed that sentence to look like this:

The cat ran to the house.

If I asked you what changed between the two sentences, what would you say?

You might tell me that the word "dog" was swapped with "cat" and that the word "store" was swapped with "house". What if I told you that I instead swapped the entire phrase "dog ran to the store" with "cat ran to the house"? Or what if I said that I erased the entire sentence, then wrote "The cat ran to the house." Is there any difference? And does it matter?

What we're describing here is a diff. A diff is a written formula that describes how a block of text has changed over a period of time. Although there are many different formats of diffs employed today, we generally use a minus sign - to indicate text deletion and a plus sign + to indicate text insertion.

The example below is how we might write a diff for the thought experiment above:

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
The [-dog-]{+cat+} ran to the [-store-]{+house+}.

However, the two diffs below are also completely valid for describing this change^[1].

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
-The dog ran to the store.
+The cat ran to the house.

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
The [-dog ran to the store-]{+cat ran to the house+}.

Diffs are incredibly powerful because they allow us to record individual edits to a file, allowing us to construct a history of changes. When we ctrl+z/ctrl+y through every single edit in a Microsoft Word document, we're actually cycling through a list of recorded diffs that track our changes.

Creating an Optimal Diff

In theory, the most optimal diff for a given change is one that requires the least amount of edits. Consider the following diffs:

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
-the quick brown fox jumps over the lazy dog.
+The quick brown fox jumps over the lazy dog.

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
[-t-]{+T+}he quick brown fox jumps over the lazy dog.

In this illustration, the second diff is more efficient than the first diff because it describes the change as a single letter difference rather than a modification of the entire sentence. Neil Fraser has an exceptional write-up on Google Doc's diff optimisation that I'll highlight here but I recommend reading it in your own time. Continuing with our original thought experiment, we first omit common prefixes or suffixes between the two texts.

The dog ran to the store. → The cat ran to the house.
    dog ran to the store. →     cat ran to the house.

Because the common prefixes and suffixes have been removed, we can guarantee that the start and end of the texts are different. In some cases, this might be the most optimal solution and we can stop. However, in this case, we have two edits: "dog" has been replaced with "cat" and "store" has been replaced with "house". To check for more than one individual edit, we break down our remaining text and look for substrings within the source text. Let's break this down by word.

    dog ran to the store. →     cat ran to the house.
    dog            store  →     cat            house 

The [-dog-]{+cat+} ran to the [-store-]{+house+}.

We've now identified our individual edits and constructed a diff!

This can be a very computationally expensive operation and the scope can even be increased further. Imagine what our diff might look like if we check every single letter:

    dog            store  →     cat            house 
    dog            st r   →     cat            h us  

The [-dog-]{+cat+} ran to the [-st-]{+h+}o[-r-]{+us+}e.

This diff is more succinct at the cost of human readability. Diffing software tends to default to word or line boundaries when calculating diffs in practice, but individual character diffs still prove incredibly useful in the context of code.

Move Operations

Another wrinkle to consider in diff computation is the concept of a move operation in which a portion of text is moved from one location to another. Consider the following texts:

The orange cat jumped on the mat. → The cat jumped on the orange mat.

With our previous diff algorithm, we might generate the following diff formula:

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
The [-orange -]cat jumped on the {+orange +}mat.

Instead, we can describe the text "orange " as if it was moved to another location.

--- 1.txt
+++ 2.txt
@@ -1 +1 @@
The ^cat jumped on the orange mat.

Patches

How are diffs used in the world of software? Before the advent of "pull requests", all online software collaboration was done through diffs called "patches" sent via email. A software patch is a diff formula encapsulated in a file that may describe multiple changes to multiple files. While a diff contains a summary of all changes, a patch can record individual commits and the changes they contain. Consider the following patch generated via git format-patch main --stdout > [PATCHFILE]:

From 92cad9b427905efc98aa2bf588cc25990df6ea8f Mon Sep 17 00:00:00 2001
From: Sam Bossley <sam@bossley.com>
Date: Fri, 2 Jan 2026 15:52:58 -0800
Subject: [PATCH 1/2] feat: upgrade zig to 0.16

---
 build.zig.zon | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/build.zig.zon b/build.zig.zon
index 9341714..8aa75bb 100644
--- a/build.zig.zon
+++ b/build.zig.zon
@@ -2,7 +2,7 @@
     .name = .pformat,
     .version = "4.0.0",
     .fingerprint = 0x1e7cdc29e12f7a74,
-    .minimum_zig_version = "0.15.2",
+    .minimum_zig_version = "0.16.0-dev.1859+212968c57",
     .paths = .{
         "LICENSE.txt",
         "build.zig",

From c48d3e428fc9a1d48678ac3ea2336a2f4cd2d080 Mon Sep 17 00:00:00 2001
From: Sam Bossley <sam@bossley.com>
Date: Fri, 2 Jan 2026 15:56:04 -0800
Subject: [PATCH 2/2] update old code

---
 src/write.zig | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/write.zig b/src/write.zig
index cfeee1e..c99a0da 100644
--- a/src/write.zig
+++ b/src/write.zig
@@ -42,12 +42,9 @@ pub fn writeAvoidCollisions(
 }
 
 fn writeFile(input_file: *const fs.File, output_file: *fs.File) void {
-    // manually call linux.copy_file_range instead of
-    // posix.copy_file_range until I update my zig version to include
-    // https://codeberg.org/ziglang/zig/commit/bc512648dbb27d895e69857202b22bc34d27f122
     var offset: i64 = 0;
     const len = std.math.maxInt(usize);
-    _ = std.os.linux.copy_file_range(
+    _ = std.c.copy_file_range(
         input_file.handle,
         &offset,
         output_file.handle,

With this patch we can identify all edits made from both commits in a concise way. This patch file can then be applied to a code repository with git am [PATCHFILE]. Similarly, because this change is fully encapsulated in a file, it is possible to store this patch for later until the need arises.

Github

Did you know that Github supports both patch files and raw diffs? You can generate an email-formatted patch file from a pull request via the following URL format:

https://github.com/AUTHOR/REPO/pull/NUMBER.patch

Likewise, it's easy to grab the entire diff with the following URL format:

https://github.com/AUTHOR/REPO/pull/NUMBER.diff

I created this diff using GNU diffutils (diff --color -u 1.txt 2.txt) but other diffing software exists. I generally prefer git diffs (git diff --word-diff). ↩︎