It is in all media that Edward Snowden has revealed the surveillance activities of the NSA. This act of spionage can be criticised as an break-in into the privacy of every person’s life.
Therefore Sang Mun has released a Font called ZXX which makes it more difficult for automated optical character recognition (OCR) programs to detect letters. Here a german article from the austrian online newspaper derstandard.at
I have looked on this font with great interrest and think that this could be a great way to avoid automated spionage from others.
When you watch the ZXX Type Secimen Video you will be surprised by how easy you can confuse OCR programs. The most interessting thing for me were the XED and the noise font. It just adds either a thin cross (X) or optimised „random“ pixels to the letters where humans are still able to read the text pretty fast. This is not the case for the Camo Font. More to that later.
So thinking of this, why is a human being able to read the Xed and noised text, while computers can’t? The answer was simple for me. (this might not be scientifically proven). The human eye takes a look at a picture and tries to recognise something. If not it focuses or „zooms in and out“ to remember the patterns we have all learned in early school years. Moreover we do not only have one point of view, we have two of them. There are two eyes with maybe two different focuspoints which makes us able to see letters behind all that camouflaging stuff. A very easy trick to solve CAPTCHAs and to read the ZXX font is to unfocus the stuff you want to read. It looks blurry. There was a program which I had in mind which can do this for me.
Simple as that I tried it – with success. Have a look at your own.
I took a sample picture from ZXX and cut off the stuff on the side (for better reading)
I uploaded this onlineocr.net
This is the result:
ABCDEFGHIJKLMNOPUPST UVWXYZabcdefghijklmn opqrstuvwxyz00123450 789%\!?@#/&*().,:$£ +x÷±-=-_"°154afyX%% ABCDEFGHIJKLMNOPUPST UVWXYZabcdefghijklmn opqrstuvwxyz00123450 789%\!?@#/&*().,:$ +x÷±-=-_"1@4afyflfl? IMBODffiGliffiktMNOPRPSt UVWOWEabedgeflijklms owettAvw*gzelS/2@xtga U;flZYXWVUTSPUPONML KJIHGEEDCBAzyxwvutsr ciponmIkjihgfedcbagg8 765432101 ?ABCDEFGHI3, KEMNOPR•STUVWXYZabcd afghilkImnopqrstwvwx wii01.2345878gRIA22DE NW N■v1.• • • • • wawl0K461ISdielP•E* 4.*AP101010104.: twintootra21224Rea,a
As you can see only some of the characters are found be the program.
If I blur the image all the confusing stuff (for the OCR Software) is removed or partly removed, so that the matches are higher than before. See for yourself.
and this is the result.
ABCDEFGHIJKLIANOPOPST UVWXYZabcdefqhijklmn opqrstuvwxyz00123450 789%\!?@0/&*(),,:WO +x+t-m-_".0:42fOgq] ABCDEFGHIJKLMNOPOPST UVWXYZabcdefqh1jklmn opqrstuvwxyz00123450 789%"?"/"().,:$EV +x-ft-m-_".11:saiyfint IMOCIVIIIJK5M4GPOOSI UVIORYilabcdsfeit311Imm opqrstswesszt62234Vd Re914ZYKWVUTSPOPONML KJIHGFEDCBAzyxwvutsr qponmlkjihgfedcba998 70543210!?ABCDEFGHIJ KLMNOPOPSTUVWXYZebcd •ighijklionopqrstuvws yz00123450789? IABCDE FGHIJKLMNOPOPSTUVWXY Zabcdefighijklmnopqrs tuvwxyz0012345078931
I think this is a remarkable difference to the original image.
I also have some words to the two other fonts.
The False font can be easily trained as I have been trained by the video. One needs to search for a big letter, but does not take its information. Instead you search for another letter within the smallest rectangle you can fit around the big letter you just found. (to explain it technically ;-) )
The Camo font is a real hard one. If you simply read the Camo alphabet it is easy to see the letters, because you expect the very next letter. And with this image in mind you procet the vectors over the image you see where it is easy to recognise. Some sort of this is also used in examples of optical illusion. What I have in mind is the Kanizsa triangle.
Image from the Wikimedia Commons
You as human see a white triangle where really is none. You just have some points or „suggestions“ of vectors and your brain finishes them to form this triangle.