Why you land in the spam folder? [Fuzzy hashing]

Mar 4, 2024
We all are sending outbound sales emails to drive new sales and opportunities and it is one of the cheapest ways to do it, but the problem is Google, Microsoft and other email hosts as using Fuzzy hashing and this is a smart way to detect duplicate content such as similar emails send into Gmail (across all their users) and if the emails are send to 100 people using Gmail they get Fuzzy hashed and Gmail will start sending them to the spam folder.
So lets learn how to avoid this and understand what is Fuzzy hashing?
Firstly why was Fuzzy Hashing invented?
‘Fuzzy hashing’ was invented to flag spam emails, but has found application in everything from malware detection to genome sequence alignment.
So how does it work?
It is a complex but straightforward machine learning model that, across all emails landing in any email host, will carry out a real-time analysis. Trying to determine if two spam emails are identical without comparing every single byte in them – this may take some time, and your first emails will be OK, but after some time, it will say welcome to spam land. They will digest the file into a short string, known as a cryptographic hash value. It’s in the nature of a cryptographic hash function that even the tiniest change in a file will yield a wildly different hash value, so it’s easy to tell if two files are identical.
Now the issue is, the cryptographicthis method will let you change one word and the meail is unique, which is the good news. But then came Fuzzy Hashing and broke it being smarter and spin syntax such as spinning the greeting Hey to Hi, but that is no longer enough, thanks to Fuzzy Hashing.
Below is shown how Cryptographic hashing looks like as a model.
notion image
Now how does Fuzzy Hashing then work and how was this changed?
notion image
Fuzzy hashing or a rolling hash solved this for mail providers: They went from just hashing the entire file, the algorithm “provide[s] a continuous stream of hash values for a rolling window over the binary [of the original file],” says David French, a senior researcher at Carnegie Mellon’s Software Engineering Institute.
By comparing a bunch of hashes from two files, you can determine the percentage difference between them.
Have you ever wondered how Google detects duplicate content so that it can deliver to you only the most relevant results, instead of the hundreds of copies of a news article or reference work that inevitably end up on splogs spread across the Internet? Fuzzy hashing, again! It can even detect identical documents that are stored in completely different formats!
The above is the reason your sales emails goes into spam, to put it in layman terms they use a % of different hashes between emails from you to check for similarities and this causes your email if very similar to land in spam.
So this is where Automated Syntax and 1-1 Personalization truly matters. If you change multiple factors of your email you can beat the Fuzzy Hashing Algorythm. Lets take a structure of an email below before and after as an example of what will do the trick and what you need to do today!
Simply Email that is very likely to land in spam before adding changes to optimise for the Inbox
notion image
In the next version we are using Spin Syntax to mix the greeting, liquid syntax to update the second part of the email based on job title and spinning a few sentences this is less likely to be matched by a Fuzzy Hash
notion image
Finally we are using Automated Syntax and 1-1 Personalization and each email stands out truly unique and are not at risk by being matched and will land in the Inbox.