Showing posts with label Obits. Show all posts

Free OCR Tools - Frustration  

Posted by Abba-Dad in , , , ,

I wanted to post this to give some folks an idea of the frustration you can expect when dealing with some free OCR tools. I try to use OCR (optical character recognition) to transcribe information from images I find mostly in online resources but sometimes also ones I scan or photograph. There are some extremely 'clean' resources out there that have been scanned in high-res and will look great in any OCR tool. But there are some awful scans out there as well. Let's run through an example.

In my last post I wrote about the obituary of Sarah Tuggle. I found the scan of the obit from 1883 on Ancestry.com and knew right away that this will not be an easy one to convert to text:


First of all, for some reason, Ancestry.com has recently started downloading images in PNG format. While this is a great format and is a close second to TIF, not many OCR applications can read it, so you have to convert it with some other tool. Luckily the basic Microsoft Office Picture Manager will do that in no time. But as you can see, the image is in extremely bad shape.

I tried first with PaperPort, which is a document organization tool that came with my DocuPen (an excellent handheld pen-sized scanner). PaperPort has a terrific OCR tool which works quickly and almost flawlessly when you deal with a good source image. But this is what I got with PaperPort:

sale.
.1 -d— of na
o~ueru~~ e siw rr.'i~'~:~ ~r ove.n~w a..n ne.. .eror..o
Close, right? That was the original PNG. Then I tried the converted JPG:
of a.. riu~~n r~ui:.
aim . .~ me .~ aor nee
wu r~
~:~~ ° .«ac.o
Not much better. I also have an OCR tool that came with my terrific HP OfficeJet Pro 8500. But I can never get it to work on images that were not scanned at a high DPI and it is clunky and not very user-friendly. I tried it anyway and just got frustrated some more.

Then I remembered that I had a great free OCR tool somewhere in the 70GB hard drive of my computer, but since I haven't used it in a while I couldn't remember what it was called and couldn't find it anywhere. So I went to look for some good OCR tool online. And there are a lot of those out there.

SimpleOCR looked promising but it couldn't convert the file at all. I tried another good image and it had a lot of errors anyway. The interesting feature was that it allowed you to chose from a drop-down list what word you want to use when it was not 100% sure what it scanned. Also, it has a 14-day trial for handwriting recognition but you have to teach the system how you write and go through a whole training exercise. That might come in handy some day.

Another free program that intrigued me was TopOCR. The interesting thing here is that it is intended for photo capture with cameras of at least 3 mega-pixel. I was sure it would be able to handle some bad scans but this is what I got:
A Adds Ads, Am, Carob Beef ~e, Bulb Or Err. PIU~DeJ Tugger

dled ~uddonl7 ye~lerd~^r at Me r~ld~ ace of h

46ugbler, Art. Plorco llf Inure, on Butler~l~eL 8,

nob * try dlnner *ad wry ~ppuenllr troll, 81 o^lr~n~d & IlUle ox ~^lll0~ d~, howe~ot, Al . dl^cd league -liar TV ~~ ~~
It basically found only one word right - Butler. So this was not going to work. It is a very quick tool though and let's you edit the outcome in a side-by-side view next to the original:


When I tried a good image I got pretty good results. But my problem is not with good images, it's the crappy ones I need help with.

So finally I found the program I had been using before. Obviously it's called FreeOCR. Doh! It also let's you view side-by-side with the original and open the recognized text in MS Word. I can't seem to get a screenshot of this application for some reason but here is what I got when I ran it:
A lnddan Death.
In. Earnh Tuggln, wits nt Hr. Plukncy Tuggle.
dlcd suddnnly yesterdny at the ruldcnce cl her
daughter, Mr:. Plame Mlm, nn llutlaralrael. Shu
aw s Imm dinner and wu nppu-entlr wall. Sha
rnmulnmnij A lime ou smlug clown, however, and
dlud \».|‘un; any cue could mach her. _
The recognition wasn't great, but it was the closest I could get. And there was no difference between PNG and JPG either. When I ran better scans through FreeOCR it did great too. And it's free!

Do you have a favorite OCR program (free or not)? I'd love to hear from some of you in the comments.

More obits - A Sudden Death  

Posted by Abba-Dad in , , ,

Last time I wrote about the death of Pinkney J. Tuggle and while searching for more information about his I ran across the obituary for his wife, Sarah Whitehead Battle Carter Tuggle. This one is shorter and very peculiar as it doesn't give a lot of information:

A Sudden Death.
Mrs. Sarah Tuggle, wife of Mr. Pinkney Tuggle,
died suddenly yesterday at the residence of her
daughter, Mrs. Pierce Mims, on Butler street. She
ate a hearty dinner and was apparently well. She
complained a little on sitting down, however, and
died before any one could reach her.


The Atlanta Constituion - 8 May 1883.

Once again, the name of their son-in-law, Pierce Mims is mentioned but this time they live on Butler street. I checked the 1883 Atlanta City Directory (page 439) and found that Pinckney J. Tuggle, a merchant, was renting at 9 Butler Street. In the address listings (page 119) there are actually 3 people listed as living at this address: P. Mims, J.P. (wrong initials) Tuggle and W. Hanley. I wonder who Hanley was.

So what does it mean that she "complained a little" and "died before any one could reach her?" This is very odd. I wonder how I can find out more about this incident.

Anyway, I just thought of another reason that Pinkney didn't want to be buried in Greene County at his father's plantation. His wife died 2 years before him and was buried at Oakland Cemetery in Atlanta.

I am going to write a follow up to this post on two topics that annoyed me:
1. Why does Ancestry.com hide the city directories where you can't easily find them?
2. Why are some OCR product so terrible?