TY - THES T1 - Text recognition express (TREx) a system for offline extraction and recognition of handwritten data from forms using self-organizing maps and support vector machines A1 - Buensuceso, Richelle Jay M. A2 - Camero, Jan Karlo L. A2 - Caras, Chester Lawrence D. LA - English YR - 2004 UL - https://ds.mainlib.upd.edu.ph/Record/UP-99796217608094435 AB - Forms processing is an essential task in different organizations, and the automation of this procedure has attracted intensive research interests due to reduction of hardwork on manual processing. The extraction and recognition of handwritten data is a grueling task, even for current recognition engines. Readability of ones handwriting is one of the common problems of individuals which leads us to the idea that "decoding: handwriting is not an easy task, especially if this would be implemented through a computer, known as Handwriting recognition. Though difficult it may seem, Handwriting recognition has proved to be a very powerful tool in technological advancement. "From the computer science perspective, the types of analysis involved are the recognition, the interpretation and the verification of handwriting. Handwriting recognition is the task of transcribing a language message represented in a spatial form of graphical marks, into a computer text, for example, a sequence of 8-bit ASCII characters." [1] Encoding of forms is a tedious task done every semester by the Engineering Administration Staff due to massive number of forms, different handwriting and types of data to be processed. This is the focus of our system, which aims to extract the necessary information found on the form, particularly the UP Form 5 with minimum user intervention. In this paper, we present a system which uses off-line extraction and recognition of handwritten data to process forms, particularly the UP Form 5. This system is divided into six phases, grouped into two modules: the Letter Detection and the Letter Recognition modules. Letter Detection involves Data Gathering and Scanning of forms, where samples are scanned using an optical scanner to obtain JPEG images; Field Isolation where we figure out which areas are to be focused on when doing the Segmentation; the Color Detection where the images are converted to data that can be interpreted by the system (pixel values) and separate the colors corresponding to the handwriting and the blank form, where the Self-Organizing Maps (SOM) are used to accomplish this; and the Segmentation where the handwritten image is broken further into smaller units of recognition. Letter Recognition involves Feature Extraction where each character is divided so that its primitive features are shown; and the resulting data is used in the sixth phase, which is Character Recognition. This phase is subdivided into two stages: Learning and Classifying. Support Vector Machines (SVM) is used to accomplish these tasks. An additional phase is the Post Processing or Contextual Verification, which enables the user to verify the correctness of the Character Recognition phase, and make the necessary changes to the result. CN - LG 993.5 2004 C65 B84 KW - Optical character recognition. KW - Image processing--Digital techniques. KW - TREx (Computer program). KW - Handwriting recognition. ER -