Skip to main content

Page Image Segmentation and In-Place Character Recognition Using A Convolutional Neural Network

ID: 2017-054 Dense pixel labeling of a document image at the same resolution as the input image.

Principal Investigator: William Barrett

Convolutional Neural Networks (CNNs) have produced excellent results in natural scene semantic pixel labeling tasks. The inventors have been able to show fully supervised Deep CNN semantic segmentation of historical form document images with multiple class labels, including handwriting, machine print, form lines, decorations and stamps. The utility of semantic pixel labeling is demonstrated in a historical form content extraction system where handwriting is paired with machine print to provide lexicon-constrained handwriting recognition. Semantic segmentation achieves high generalization accuracy on form variants with interleaved, overlapping strokes, even when trained on a solitary pixel-labeled form image. This work suggests that automated indexing and transcription of highly mixed-content historical document images is possible by leveraging semantic segmentation.

This technology has been realized in a computer program that performs various functions and is trainable from as little as a single representative ground truth-labeled image.

Previous methods using any sort of neural autoencoder, convolutional kernels, or trainable parameters are limited to single-element, single-class prediction using a larger receptive field. The inventors improved this practice by predicting dense pixel maps which reuse redundant convolutional and pooling computations from adjacent pixels in the network's receptive field by allowing multiple pixel class predictions in a single forward pass of the network.

The method of training a handwriting or optical character recognizer using direct regression of dense pixel is entirely novel, since all existing frameworks require either a priori segmentation pipeline at the text line, word, or character level, or a weakly supervised attention mechanism to perform segmentation and transcription simultaneously. Priori segmentation methods suffer from hampered accuracy, poor generality and bias due to their non-trainable nature. This method can also be used to recognize characters directly in their spatial context and preserving precise pixel-level location information for each character, without the need to impute any reading order, and is trainable end-to-end, setting it apart from all previous methods for character recognition.

About the Market:

This invention could be of interest and benefit any entities who do scanning, processing, recognition of documents, and document components for pre-processing and semantic labeling of document components.

For more information, contact Dave Brown (801-422-4866)

Links and Resources