Services
Quote Professionally
equipped to manage your needs, we offer
services that allow you to allocate
resources efficiently while staying focused
on your core business. To request a
Quote, please click here
to email us. In your email please provide a
description of the project and a contact phone number.
Our
Brochure To request our
brochure, please click here
to email us. In your email please provide a
description of the project and a contact phone number.
Optimizing PDFs for
User Search
OCR or
Optical Character Recognition refers to
technologies that involve "reading" words from a
scanned image by translating each character on an
image into searchable text. OCR enables users to
search for and retrieve information within a file or
page. In addition, when a set of files is indexed,
users are able to search for keywords across an
entire document library and retrieve each page with
exact precision. OCR enables users to execute
searches in seconds, searches that once required
hours or days to complete.
It is important to note that the quality and
condition of a paper document collection are key
factors in the successful recognition of characters
to create readable text. Therefore, to enhance the
quality of each original page, we start by focusing
on the scan quality of each image -- removing noise
such as borders, speckles, and skews. In addition,
we utilize advanced color filter technologies to
remove any page background colors, in conjunction
with multi-light image capture technologies to
remove any shadows cast by page creases that could
impact image quality or recognition accuracy.
Once
document scanning and
processing are complete,
an OCR text layer is added behind each image
utilizing our dDOptimaOCR solution. This solution
begins with an additional orientation filter to
ensure that the best image is presented to the OCR
engines. Next, the characters in the image are
processed utilizing multi-engine OCR voting
technologies that rank each character to determine
the best text recognition fit. Then once a word is
generated, it is filtered through a proprietary
lexicon to ensure the highest quality results.
Finally, this text is processed utilizing
sophisticated layout retention technologies to
represent the image text layout, providing the best
possible text representation for pinpoint search and
retrieval accuracy. Once these processes are
complete, the Quality Control Team generates an OCR
Benchmark Report detailing the accuracy of the OCR
process and the quality of the results.