Bulk Comparison with Hashbytes

Imagine we have two tables (or views), TableNew and TableOld, and we want to know if their data is the same or different.

Assume they have the same schema.

Further assume that their primary key is a composite key with two columns, PK1 and PK2.

With
NEWSET as (
SELECT
	MBT.PK1,
	MBT.PK2,
   hashbytes('MD5',
			   (SELECT MBT.*
				FROM (VALUES(NULL))foo(bar)
				FOR xml auto)) AS [Hash]
FROM TableNew AS MBT)
,OLDSET as (
SELECT
	MBT.PK1,
	MBT.PK2,
   hashbytes('MD5',
			   (SELECT MBT.*
				FROM (VALUES(NULL))foo(bar)
				FOR xml auto)) AS [Hash]
FROM TableOld AS MBT)
,Comparison as (
Select
	n.PK1,
	n.PK2,
	Case
		when o.[Hash] is null then 'Not in old set'
		when o.[Hash] != n.[Hash] then 'Changed'
		else 'Same'
	end as Summary
from NewSet n
left outer join OldSet o on n.PK1 = o.Pk1 and n.PK2 = o.PK2
)
,Comparison2 as (
Select
	o.PK1,
	o.PK2,
	'Not in new set' as Summary
from OldSet o
left outer join NewSet n on n.PK1 = o.Pk1 and n.PK2 = o.PK2
where n.PK1 is null and n.PK2 is null -- i.e. not match
)
Select Summary, Count(*) as [Count] from Comparison group by Summary
UNION
Select Summary, Count(*) as [Count] from Comparison2 group by Summary

This will give you a summary table that tells you:

e.g.

Summary Count
Changed 607534
Not in new set 56
Not in old set 7017
Same 65449

Problem

The maximum input to hashbytes -- prior to SQL Server 2016 -- is 8000 bytes.

Solution for versions less than SQL Server 2016

This yak took many cans of shaving cream!

You can use this function to hash varbinary(max) ... (note you don't specify which hash function is used... so I'm guessing it's possible that two different database instances might use different hash functions inside the repl hash binary... it may be configurable? but within one database instance this should work consistently...)

master.sys.fn_repl_hash_binary(SOME_VARBINARY)

...but we don't have a var binary we have a nvarchar(max) !

So cast the nvarchar(max) to varbinary(max) before running the function....

master.sys.fn_repl_hash_binary(cast(SOME_NVARCHAR_MAX as varbinary(max)))

...And that appears to work, in SSMS... but the output is actually an array of bytes, not a varchar of any sort.

I found the trick this time is to convert the result to varchar(32) with a third parameter of 2, i.e.

Convert(varchar(32), master.sys.fn_repl_hash_binary(cast(SOME_NVARCHAR_MAX as varbinary(max))), 2) as Hasho

So here's our function modified to work with large nvarchar's prior to SQL Server 2016. (I'm writing/needing this in 2020!)

With
NEWSET as (
SELECT
	MBT.PK1,
	MBT.PK2,
	Convert(varchar(32), master.sys.fn_repl_hash_binary(cast((SELECT MBT.*
			FROM (VALUES(NULL))foo(bar)
			FOR xml auto) as varbinary(max))), 2) as [Hash]
	FROM TableNew AS MBT)
,OLDSET as (
SELECT
	MBT.PK1,
	MBT.PK2,
	Convert(varchar(32), master.sys.fn_repl_hash_binary(cast((SELECT MBT.*
			FROM (VALUES(NULL))foo(bar)
			FOR xml auto) as varbinary(max))), 2) as [Hash]
FROM TableOld AS MBT)
,Comparison as (
Select
	n.PK1,
	n.PK2,
	Case
		when o.[Hash] is null then 'Not in old set'
		when o.[Hash] != n.[Hash] then 'Changed'
		else 'Same'
	end as Summary
from NewSet n
left outer join OldSet o on n.PK1 = o.Pk1 and n.PK2 = o.PK2
)
,Comparison2 as (
Select
	o.PK1,
	o.PK2,
	'Not in new set' as Summary
from OldSet o
left outer join NewSet n on n.PK1 = o.Pk1 and n.PK2 = o.PK2
where n.PK1 is null and n.PK2 is null -- i.e. not match
)
Select Summary, Count(*) as [Count] from Comparison group by Summary
UNION
Select Summary, Count(*) as [Count] from Comparison2 group by Summary