Add to bookbag
Author: Eric G.E. Zuelow
Title: OmniPage & Adobe Acrobat 3.0
Publication Info: Ann Arbor, MI: MPublishing, University of Michigan Library
April 1999
Availability:

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

Source: OmniPage & Adobe Acrobat 3.0
Eric G.E. Zuelow


vol. 2, no. 1, April 1999
Article Type: Software Review
URL: http://hdl.handle.net/2027/spo.3310410.0002.119
PDF: Download full PDF [389kb ]

OmniPage & Adobe Acrobat 3.0

Eric G.E. Zuelow

Preparing documents for Publication: OmniPage & Acrobat

Whether preparing online course materials, departmental web sites, online archives of historical sources, or creating a computerized version of printed materials, the ability to quickly digitize text files is essential. Two applications make this job easier: OmniPage Pro and Adobe Acrobat. Because they are designed to accomplish different goals, both are worthwhile additions to software libraries and available both Windows and Macintosh.

OmniPage Pro 8.0 (Macintosh) ; OmniPage 9.0 (Windows)

Caere Incorporated
Requires 16MB RAM, 25MB Hard drive for Mac
Requires 16MB RAM, 45MB Hard drive for Windows
$119.99

OmniPage Pro 8.0 is an Optical Character Recognition (OCR) application. Essentially, OCR means that the software will convert the resulting graphic file into a text file in most major word processing formats or HTML. In other words, if one finds a paragraph to quote in the next book or on a web site, all that is  needed is to scan the page into OmniPage, run an OCR, and one can easily place the resulting text into the manuscript or web site. Furthermore, OmniPage is capable of 'reading' English, French, and German.

[figure]
OmniPage utilizes a convenient selection of tool menus, making this software easy to use.

The software itself is exceptionally easy to use and, given its impressive capabilities, requires relatively little disk space–25MB of free hard drive space and 16MB of RAM for the Macintosh version 8.0, 45MB of hard drive space and 16MB of RAM for Windows version 9.0. The application retails for around $119.99.

Installing the software for both the Mac and Windows operating systems is fairly simple. Scanner installation can pose numerous challenges given the range of potential hardware and software conflicts that may result.

Once installed, the OCR process is very simple. one only needs to hit the scan button, then the OCR button, and finally save the file in whatever word processing, HTML, or graphic format is desirable. If one wish to use only a small segment of a document, or one want the software to differentiate between graphics and text, an easy-to-use tool kit allows one to divide oner newly-scanned page into graphics and text zones. When only using a paragraph of text, turning this paragraph into a zone saves time because the software only needs to recognize a small portion of text rather than an entire page.

[figure]
The main OmniPage functions are easily accomplished through this simple toolbar.
For those who intend to publish their materials on the Web, OCR can be invaluable because the resulting HTML files are small and will load quickly on any browser and even at slower connection speeds.

Scanning in longer documents is just as easy as doing a single paragraph. Instead of running the OCR function after each scan, simply wait until all desired pages have been scanned and the application automatically compiles all pages into a single document.

A spell-checking function makes it easy to catch any errors resulting from the OCR process. Even so, I tend to spell-check files in whichever word processing application I am using at the time. Given that page formatting frequently does not survive the OCR process, it is easier for me to reformat the document as I spell-check it, but this is purely a personal preference.

[figure]
This document was originally a clear column of text with a "watermark" behind it. After running the OCR function in these conditions, the text became little more than garbled characters.

OmniPage, though exceptionally useful in many cases, is not without flaws. In particular, if the document is not of laserprint quality, OCR will not work properly. If a page contains even slightly blurry text, a "watermark," or any other smudges, the OCR process will produce garbled characters. These complications can be a major problem for scholars working with documents which are handwritten, dirty, or on darker paper. Also, documents containing complex formatting will require substantial reformatting once they have been scanned in—a fact which may make OmniPage more troublesome than helpful.

Even with these problems, OmniPage can save time, bandwidth, and wrists by avoiding either the need to retype documents or resort to another memory-hungry format for presentation.

Adobe Acrobat 3.0

Adobe Systems
Requires 16MB of RAM and 40MB of hard drive space for the Mac and Windows,
Requires 52MB of disk space for Unix.
It retails for $184.99 (roughly $50 academic).

Assuming that one's ultimate goal is to produce digitized documents which are readable on multiple platforms, or publish material on the Web which Omnipage cannot easily handle, Adobe's Acrobat is frequently the best available alternative. While pages must be in pristine condition to work properly with OmniPage, Acrobat will accurately reproduce the scanned document.

Acrobat is used to create a Portable Document Format (PDF.) files. These files are readable through Adobe's free Acrobat Reader application, regardless of platform. Acrobat Reader software is available on the Adobe Web site.

[figure]
In Acrobat one can integrate images and text to make attractive and informative documents. one can also create handy indexes allowing readers to quickly find what they're looking for in oner document.

Those wishing to create Acrobat files must purchase a suite of applications collectively called Adobe Acrobat. This suite contains: Exchange, Distiller, Catalog and Reader applications. Exchange is used to create Acrobat files, either by converting Postscript or Encapsulated PostScript files (EPS) into PDF format or by scanning the documents in directly. Exchange also allows one to create movies and add links to PDF files. Distiller is used to compress files, but will not read PDF format, so the user must save files created in Exchange to EPS or Postscript format before importing into Distiller. Catalog is used to create searchable PDF files—one of the more exciting Acrobat features. In short, Acrobat is poorly organized. Nevertheless, once the user has worked through the confusion resulting from the software's poor design and documentation, it can be an effective tool.

[figure]
The Distiller interface is spartan because once one've set oner preferences the application works on its own to compress oner files.

Initially, I found Acrobat to be frustrating. Immediately upon installing the software, I discovered that it would not recognize my scanner—a commonly available and inexpensive UMAX Astra 610S. I searched the online reference guide and found no means of fixing the problem. Next, I searched the Adobe web site where I did find a reference to the problem I had experienced. Although I carefully followed their directions, I was unable to fix the problem. Further compounding my frustration, the only available help number was a long distance call to Adobe's Seattle headquarters. There was no e-mail support and no 1-800 number, in spite of the fact the software supposedly comes with free support for new users. With no means of fixing the problem cheaply, I worked around it by scanning documents using OmniPage and then saving them as EPS files. I was then able to open these files in Acrobat.

Immediately upon converting files into PDF format, I discovered another problem—at least as far as Web publishing is concerned. PDF files in their uncompressed format are massive. In my case, I had an 8 page newsletter which, when converted into PDF, swelled to nearly 11MB.

Distiller makes it possible to compact PDF files significantly. As Distiller will not read PDF files, one must convert their files back to EPS or Postscript, then move these files into specially created "Watched Folders." Distiller, when open, automatically checks these folders and compresses uncompressed files that it finds there. Once a file is compressed, it is moved to an "Out" folder which the software automatically creates in the Watched Folder. Having completed this process, the 11 megabyte file was reduced to 700k—still a big file, but much smaller than before. It is worth noting that one could eliminate a step by opening the EPS file first in Distiller and then doing further work on it in Exchange or Catalog. It is also worth noting that Distiller offers several options for file compression. The option one choose defines the amount which oner file will be compressed and also which version of Acrobat Reader will be required to open it. In my case, I opted for the maximum amount of compression I could get without sacrificing legibility, so requiring that those accessing my files have at least Reader version 3.0. As Acrobat Reader is free, this choice seemed reasonable.

Once the PDF file is created, placing it online is as simple as placing it on a Web server and linking to it. Users who own Acrobat Reader need only click on the link to open their Reader application and oner PDF file within their browser window (assuming they have the correct Reader version and have installed it correctly).

[figure]
The Acrobat Cafe included on the Acrobat Reader CD contains many examples of features included in Acrobat Exchange, including this movie introducing the Cafe and its contents.

Acrobat contains numerous additional features. For example, one can place links into their documents, create movies, or create a searchable directory of larger files. (An impressive catalog of examples is found on the Acrobat Reader CD which comes with the package. Even the complete works of Shakesphere are included!) one can then make oner PDF documents available for download or distribute them through other channels. Given the power inherent in fully cataloged documents, it might be tempting to provide materials to web users only in Acrobat format, as several newspapers have done. (Arabic newspaper Al-Hayat is a good example.) Only file size poses a significant barrier to doing this. As most colleges and universities now provide students and faculty with direct Internet connections, providing course materials and handouts in Acrobat format is not unreasonable.

Problematically, the Acrobat user manual is only available online which makes it difficult to work step by step alongside the manual. While saving paper is always appreciated, there is a tangible benefit to holding a hardcopy of the manual in one's hand while working with often confusing software, such as this program.

While I have been critical of Acrobat, it does have many benefits: no other application provides cross-platform communications as seamlessly as Acrobat; documents need not be in flawless condition to make highly readable PDF files; when printing, PDF files maintain their formatting while HTML files may become terribly corrupted; PDF files can be easily downloaded by users for future reference; PDF files require no major layout work—graphics are displayed along with text and in the same format as found on the printed page; and finally, being able to search large files is exceptionally helpful for users.

In the end, OmniPage and Acrobat provide differing ways of acquiring and displaying text documents on the computer. While each has tangible problems, the benefits offered by these applications generally outweigh the drawbacks.

Eric G.E. Zuelow
University of Wisconsin-Madison