Recognition errors

This is a sample text that was scanned and recognized by the leading OCR software. Bad quality of the original document and an unusual font have resulted in many errors.

Did one of the beft of Kings, furrounded by his guarhs, aud in his capifal citv? And wiIl not plots be fometimes formed againft them that are nOt to be refittcd by thelr couragc, nor ayoided by thCir wifdom? When mlsfortunes happcn that reflCct upon the honOr of a nation, the weight almoft singty lies upOn the Prince 4, if thev are fuch as hurt a whole cOUntry the point or intereft, as fOr example, by fire, pIaque, or famlne, he bCars share, as having the largeft share in propCrty. The adverfities of the commonwcalth affecl him in ohief, and he has - but his bare prOPOrtiOn in its propertIes: add to this, that he can.

This text contains 116 words, 39 of which have recognition errors. It is practically incomprehensible. AfterScan processes this text in 5 seconds and produces the following output:

Did one of the best of Kings, surrounded by his guarhs, and in his capital city? And will not plots be sometimes formed against them that are not to be refitted by their courage, nor avoided by their wisdom? When misfortunes happen that reflect upon the honor of a nation, the weight almost singly lies upon the Prince 4, if they are such as hurt a whole country the point or interest, as for example, by fire, plaque, or famine, he bears share, as having the largest share in property. The adversities of the commonwealth affect him in chief, and he hasbut his bare proportion in its properties: add to this, that he can.

Only one error was not corrected. It will be shown in the Journal of Modifications along with spell-checker suggestions (see screenshots). It would have taken at least 4 minutes to tweak this text manually. Imagine proofing a few megabytes of text. Now you can fire your staff of correctors and have it done in a few hours instead of few weeks.

Typing errors

Manual input (typing) errors are different from OCR errors and require different rules and algorithms of correction.  


In the old times, text editors used spaces for indentation and justification of the text. Sometimes you can come across a text that looks like this. It looks fine in proportional font but if you try to resize that window you will see that the text is space-justified (double and triple spaces between words), there are hard breaks after each line and first line is indented with spaces. Also, the word "gentleman" is hyphenated (carried over to the next line). 

You cannot do much with this kind of text in a modern editor with "floating" justification unless you remove all hard line breaks and eliminate extra spaces and hyphens. AfterScan has the ability to do just that. The reformatting function will produce the following text. Try to resize that window and you will see the difference.


