Scanning and Editing Documents

One of my goals has always been to scan all of the TI99 documents in my collection, and upload them to my site so that others can enjoy them.  When I first started scanning things many years ago, the only way that you could make PDF documents was to pay a lot of money for Adobe software.  That wasn’t an option for me, so I used the free software that came with my Visioneer scanner to save scans into the MAX document format.  I’ve tried to convert most of the MAX files on this site to PDF format, but if you come across any on ftp.whtech.com, you can still view them using the old MAX viewer.  You can download a copy here.

Since then, the Visioneer software has changed into PaperPort, and I’ve purchased several licenses over the years.  The nice thing about PaperPort is that it while you can use it for scanning, editing, and ordering pages in documents, it allows you to save a document as a PDF file.  It also handles MAX files, so I’ve used it to convert them to PDFs. When I scanned my TI-99/4 manual, I didn’t want to break the binding, so I made one of these DIY book scanning rigs and used my digital camera to take a jpeg photo of each page.  That’s when I became familiar with ScanTailor.

ScanTailor is free, and does an amazing job of cleaning up and organizing document images.   ScanTailor only handles document pages in TIFF or JPEG (perfect for using with digital cameras).  If the pages are already in JPEG format like mine were when I was using my camera, you just put them in one directory.  If they’re in a different format, you’ll have to convert them- PaperPort is great for this, as you can ‘unstack’ PDF files, and then save them as JPEG or TIFF.   When you start ScanTailor, you point it to the directory that you’ve saved your pages in.  ScanTailor then lets you reorder and process these pages.  There are 5 processing stages:  Fix orientation (rotate the page), split pages (if there are 2 pages in each image, tell ScanTailor where the split is between the 2 pages), deskew (correct tilted pages), select content (show the program where the content is on each page.  It tries to detect it, but sometimes it doesn’t catch the page numbers in the corner, i have to drag the selection box to cover the page numbers), and  margins. After that, you are given an opportunity to despeckle (remove stray dots) and dewarp (fix parts of the image that are curved because a page wasn’t flat)  and then start processing.

The pages are placed into an ‘out’ directory as JPEG or TIFF images.  If the despeckle function didn’t clean up some of the pages enough, I’ll open them up in a graphics program (I use  Paint.NET – it’s free) and erase any stray marks.  Then, I import the images into PaperPort, ‘stack’ them in the correct order, and save them as a PDF.

The last step is to run the PDF file through an OCR program so that you can search within the PDF.  I’ve been using the free version of PDF-XChange Viewer which you can download at http://www.tracker-software.com/product/pdf-xchange-viewer. Once you open a PDF with it, click on the ‘Document’ tab, and choose ‘OCR Pages’.

I’ve got a bit more on this, so I’ll post it in a ‘part 2’.  If you do things differently, or you have any suggestions, I’d like to hear them.  Please post comments!

-Rich

One thought on “Scanning and Editing Documents”

Comments are closed.