Converting Scanned Documents to Text

Although you wouldn't want to depend on it for mission-critical applications, Office comes bundled with a surprisingly good Optical Character Recognition system. Converting hardcopy to a Word document is a two-step process:

  1. Scan the document and turn it into a TIF file. To do so, from Windows, choose Start, Programs, Microsoft Office Tools, Microsoft Office Document Scanning, and follow the instructions.

Tip from

The Document Scanner includes an option to "re-stitch" documents that are printed on both sides of the paper. You scan one side and then the other, and the software puts the two halves together, in sequence. It's a very handy feature.


  1. Turn the TIF file produced by the Office scanner (or any other TIF file, for that matter), into a Word document. Click Start, Programs, Microsoft Office Tools, Microsoft Office Document Imaging. Open the TIF file. Click the Send Text to Word icon. It can take a few minutes for the OCR engine to analyze a page, but the results can be quite good.

In experiments scanning a pristine printout employing typical business text in 12-point Times New Roman font, recognition rates were in the high 90s. On a faxed tabular report, originally printed on an impact printer, complete with (nonsimulated) coffee cup stains, Office still managed to recognize much more than 80% of the characters.

Whether that's good enough for your business is largely a matter of how the scanned documents will be used. If you're relying on sophisticated indexing to retrieve all the documents that include specific phrases, recognition in the high 90s might not be good enough. On the other hand, if you want electronic copies of documents to get the gist of what was said, Office's tools are much more than adequate.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.161.161