Compare Files with PowerShell: a faster way

Sometimes you need to test if two files are the same. As files are getting larger, your scripts will take longer, so we need to look into performance. In this article, I'll show how to compare two files using a buffered approach in PowerShell.
When talking about performance it is better to measure multiple times on multiple systems. In this blog I only measured once on a single system, because I'm only interested in relative changes.
Strategies
When you want to compare files, you have some strategies:
Extra performance: use LINQ
PowerShell is a scripting language that has native support for comparison. But it is way faster to use LINQ to do the buffer comparison in native PowerShell! We' use the SequenceEqual, like this: [System.Linq.Enumerable]::SequenceEqual($one, $two). This LINQ method was 500x faster on my machine.
FilesAreEqual function
Let's create a function that takes two files, compares them and returns $true when both files are equal:
function FilesAreEqual {
param(
[System.IO.FileInfo] $first,
[System.IO.FileInfo] $second,
[uint32] $bufferSize = 524288)
if ( $first.Length -ne $second.Length ) { return $false }
if ( $bufferSize -eq 0 ) { $bufferSize = 524288 }
$fs1 = $first.OpenRead()
$fs2 = $second.OpenRead()
$one = New-Object byte[] $bufferSize
$two = New-Object byte[] $bufferSize
$equal = $true
do {
$bytesRead = $fs1.Read($one, 0, $bufferSize)
$fs2.Read($two, 0, $bufferSize) | out-null
if ( -Not [System.Linq.Enumerable]::SequenceEqual($one, $two)) {
$equal = $false
}
} while ($equal -and $bytesRead -eq $bufferSize)
$fs1.Close()
$fs2.Close()
return $equal
}
Buffering works
So what happens when we plug different values into $bufferSize? This is what I got on my machine by comparing the Wars Of Liberty bin of 140MB with itself:
[table id=8 /]
A buffer that is large enough will make your file compare way faster, that's why I settled on 524.288 as a number for this PowerShell function. When I compare the entire setup file of Wars of Liberty of 2.2GB it took me 42 seconds.
So what about a byte-by-byte comparison? That one took me 32 seconds on the 140MB file, so that's >13x slower than using a buffer of 524.288 bytes.
Just need a script?
If you just need a script, copy this text to compare.ps1 and run it on the command-line like .\compare.ps1 .\article.html .\article2.html. The result will be printed to the console.
<#
.SYNOPSIS
Compares two files. Returns True if the files are equal.
.DESCRIPTION
Compares two files. Returns True if the files are equal; otherwise False.
Use the bufferSize to optimize for speed. Might depend on your system.
.PARAMETER file1
The first file.
.PARAMETER file2
The second file.
.PARAMETER bufferSize
The size of the buffer will influence the speed of the script.
.OUTPUTS
True when equal otherwise False.
.LINK
More info: https://keestalkstech.com/2013/01/comparing-files-with-powershell/
#>
param(
[Parameter(Mandatory = $true)]
[string]
$file1,
[Parameter(Mandatory = $true)]
[string]
$file2,
[uint32]
$bufferSize = 524288)
$ErrorActionPreference = "Stop"
$PSDefaultParameterValues['*:ErrorAction']='Stop'
$first = Get-Item $file1
$second = Get-Item $file2
if ( $first.Length -ne $second.Length ) { return $false }
if ( $bufferSize -eq 0 ) { $bufferSize = 524288 }
$fs1 = $first.OpenRead()
$fs2 = $second.OpenRead()
$one = New-Object byte[] $bufferSize
$two = New-Object byte[] $bufferSize
$equal = $true
do {
$bytesRead = $fs1.Read($one, 0, $bufferSize)
$fs2.Read($two, 0, $bufferSize) | out-null
if ( -Not [System.Linq.Enumerable]::SequenceEqual($one, $two)) {
$equal = $false
}
} while ($equal -and $bytesRead -eq $bufferSize)
$fs1.Close()
$fs2.Close()
return $equal
Further reading
While working on the subject I found some interesting reads:
Improvements
2020-10-10: added the Just need a script? section.
2020-10-10: on some systems the first 2 if statements did not work, according to caspertone2003; fixed it with his code.
2020-06-08: the original article used a byte-by-byte comparison, slowing things down on larger files. After writing the impact of the buffering on file streams, I rewrote this article to include buffer and .NET LINQ to improve the performance.