blog




  • Essay / The concept and types of OCR (optical character recognition)

    Table of contents1. Introduction1.1 Background1.2 Motivation2. Basic Study2.1 OCR2.2 Types of OCR3. Bangla OCR3.1 Existing research3.2 Existing projects3.3 Limitations4. Proposed methodology and implementation4.1 Deep CNN4.2 Why Deep CNN4.3 Experimental data4.4 Training and recognition1. Introduction1.1 BackgroundWith the advent of computers and Internet technology, the possibilities for collecting data and using it for a variety of purposes have exploded. The possibilities are particularly enticing when it comes to textual data. Converting the vast amount of data accumulated over the years of human history into digital format is vital for preservation, data mining, sentiment analysis, etc., which will only add to the progress of our society. The tool used for this purpose is called OCR. Say no to plagiarism. Get a tailor-made essay on “Why Violent Video Games Should Not Be Banned”? Get the original essay 1.2 Motivation Like many other languages, Bengali can also benefit from OCR technology – especially since it is the seventh most spoken language in the country. the world and the population of speakers is approximately 300 million. The Bengali-speaking population is found mainly in Bangladesh, the Indian states of West Bengal, Assam, Tripura, Andaman and Nicobar Islands, as well as the ever-growing diaspora in the United Kingdom (UK), in the United States (US), Canada, Middle East. -East, Australia, Malaysia, etc. Thus, advancements in the digital use of the Bangla language is something that many countries are interested in.2. Background Study 2.1 OCROCR is the short form of optical character recognition. It is a technology for converting images of printed/handwritten texts into a machine-readable, i.e. digital, format. Although OCR today is primarily focused on scanning text, old OCRs were analog. The world's first OCR is considered to have been invented by American inventor Charles R. Carey, who used an image transmission system using a mosaic of photocells. Later inventions focused on scanning documents to produce more copies or to convert them into telegraphic code, and then the digital format gradually became more popular. In 1966, the IBM laboratory in Rochester developed the IBM 1287, the first scanner capable of reading handwritten digits. The first commercial OCR was introduced in 1977 by Caere Corporation. OCR started to be available online as a service (WebOCR) in 2000 on various platforms via cloud computing.2.2 Types of OCR According to its method, OCR can be divided into two types: OCR in line (not to be confused with "online" in Internet technology) involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up movements of the pen tip as well as the pen up/down switching. This type of data is known as digital ink and can be thought of as a digital representation of handwriting. The resulting signal is converted into letter codes usable in computer and word processing applications. Offline OCR scans an image as a whole and does not process stroke order. It is a kind of image processing because it attempts to recognize character patterns in given image files. Online OCR can only process written texts in real time, while offline OCR can process images of handwritten and printed texts and no special devices are available. necessary.3. Bangla OCR3.1 SearchesExisting Most successful Bangla OCR research so far has been carried out on printed text, although researchers are gradually moving into handwritten text recognition. Sanchez and Pal* proposed a classical line-based approach for continuous Bengali handwriting recognition, based on hidden Markov models and n-gram models. They used both word-based LM (language model) and character-based LM for their experiment and found better results with word-based LM. Garain, Mioulet, Chaudhuri, Chatelain and Paquet* developed a recurrent neural network model to recognize unconstrained Bangla handwriting at the character level. They used a BLSTM-CTC-based recognition tool on a dataset consisting of 2,338 unconstrained Bengali handwritten lines, or about 21,000 words in total. Instead of horizontal segmentation, they chose vertical segmentation classifying words into “semi-ortho syllables”. Their experiment yielded an accuracy of 75.40% without any post-processing. Hasnat, Chowdhury and Khan* developed a Tesseract-based OCR for Bangla script which they used on printed materials. They achieved a maximum accuracy of 93% on clean printed documents and a minimum accuracy of 70% on a screen-printed image. Obviously this is very sensitive to variations in letter shapes and is not very favorable for use in recognizing Bengali handwriting characters. Chowdhury and Rahman* proposed an optimal neural network setting for Bengali handwritten digit recognition which consisted of two convolution layers with Tanh. activation, a hidden layer with Tanh activation and an output layer with softmax activation. To recognize the 9 Bangla digital characters, they used a dataset of 70,000 samples with an error rate of 1.22% to 1.33%. Purkayastha, Datta and Islam* also used convolutional neural network for Bangla handwritten character recognition. They were the first to work on handwritten characters composed in Bengali. Their recognition experiment also included numerical characters and alphabets. They achieved 98.66% accuracy on digits and 89.93% accuracy on almost all Bengali characters (80 classes).3.2 Existing ProjectsSome projects were developed for Bangla OCR, it should be noted that none of them only works on handwritten text.BanglaOCR* is an open source OCR developed by Hasnat, Chowdhury and Khan* that uses the Google Tesseract engine for character recognition and works on printed documents, as discussed in section 3.1 Puthi OCR aka GIGA Text Reader is a cross-platform Bangla OCR application developed by Giga TECH. This application works on printed materials written in Bengali, English and Hindi. The Android app version is free to download, but the desktop version and enterprise version require payment. Chitrolekha* is another Bangla OCR using Google Tesseract and Open CV Image Library. The app is free and may have been available on Google Play Store in the past, but at present (as of 07/15/2018) it is no longer available. i2OCR* is a multilingual OCR supporting over 60 languages ​​including Bengali.3.3 LimitationsMany existing Bangla OCRs have major limitations such as segmentation: Two types of segmentations are used to separate individual characters/shapes – horizontal and vertical. Handwriting recognition OCRs using horizontal segmentation do not yield many resultseffective in cursive Bengali texts. Cursive Forms: Many OCRs have been successful in recognizing individually written Bengali numbers or characters, but when processing texts with Bengali cursive forms, they do not give favorable results. Variation in shapes: The method of writing characters varies greatly from person to person, especially since Bangla has many shapes due to kar and compound letters. No OCR has yet been developed to recognize all these shapes in handwriting.4. Proposed Methodology and Implementation4.1 Deep CNNDeep CNN stands for Deep Convolutional Neural Network. First of all, let's try to understand what a convolutional neural network (CNN) is. Neural networks are tools used in machine learning inspired by the architecture of the human brain. The most basic version of the artificial neuron is called a perceptron, which makes a decision based on weighted inputs and probabilities relative to a threshold value. A neural network is made up of interconnected perceptrons whose connectivity can differ in various configurations. The simplest perceptron topology is the feedforward network consisting of three layers: input layer, hidden layer, and output layer. Deep neural networks have more than one hidden layer. So, a deep CNN is a convolutional neural network with more than one hidden layer. Now let's come to the question of convolutional neural network. If neural networks are inspired by the human brain, CNNs are another type of neural network that goes further by also drawing some similarities with the visual cortex of animals*. Since CNNs are influenced by research on receptive field theory* and the neocognition model*, they are better suited to learning multi-level hierarchies of visual features from images than other vision techniques by computer. CNNs have made significant progress in AI and computer vision in recent years. The main difference between convolutional neural networks and other neural networks is that a neuron in the hidden layer is only connected to a subset of neurons (perceptrons) in the previous layer. Due to this sparseness of connectivity, CNNs are able to learn features implicitly, i.e., they do not need predefined features during training. A CNN consists of several layers such as the convolutional layer: This is the basic unit of a CNN where most of the calculations take place. . A CNN consists of a number of convolutional and pooling (subsampling) layers, optionally followed by fully connected layers. The input to a convolutional layer is the image amxmxr where m is the height and width of the image and r is the number of channels. The convolutional layer will have k filters (or kernels) of size nxnxq where n is smaller than the image dimension and q can be the same as the number of channels r or smaller and can vary for each kernel. The size of the filters gives rise to the locally connected structure which is each convolved with the image to produce k feature maps of size m−n+1. Pooling layer: Each feature map is then downsampled typically with average or maximum pooling over contiguous pxp. regions where p is between 2 for small images (e.g. MNIST) and is generally no greater than 5 for larger inputs. Alternating convolutional layers and pooling layers to reduce the spatial dimension of activation maps, leading to lower overall computational complexity. Some common pooling operations are max pooling, max pooling,.