-
Essay / Speech synthesis device for visually impaired patients
Table of contentsSummaryIntroductionMethodsSystem specificationsSoftware design of image processing moduleDesign implementationCompilation of word correction modulePhoneme to speech converterSummaryResultsConclusionReferencesSummaryWith a maximum visibility of six meters and a view Maximum width of 20 degrees, people with low vision are unable to see words and letters on regular newspaper. This fact makes the reading process difficult, which can disrupt the learning process and slow down the development of the patient's intelligence. So a device is needed to help them read more easily. One of the devices being developed today is a device that uses another sense, the sense of hearing. Text-to-Speech is a device that scans and reads an Indonesian textbook by transforming it into voice. Say no to plagiarism. Get a Tailored Essay on “Why Violent Video Games Should Not Be Banned”?Get an Original EssayThe purpose of the device is to process image as input into voice as output. This article describes the design, implementation and experimental results of the device. This device consists of three modules: an image processing module, a word correction module and a voice processing module. The device was developed based on the Raspberry Pi v2 with a processor speed of 900 MHz. The audio output can be easily understood, has a total error rate of less than 2% and a processing time of almost two minutes for A4 paper size text entry. This device provides convenience to visually impaired people by guiding them through voice, it also has the ability to play and stop playback during playback.IntroductionAccording to Thylefors in Gianini (2004), visual impairment can have negative effects on the learning and social interaction. This can affect the natural development of intelligence and academic, social and professional abilities [1]. Based on Riskesdas data in 2013, the total number of visually impaired people in Indonesia was 2,133,017 [2]. Low vision in a visually impaired person cannot be fixed with glasses. The maximum visibility for these patients is 6 meters with a maximum wide view of 20 degrees. This makes visually impaired people cannot see normal printed paper. They can only see if the size of the characters or letters is large enough. This condition impacted the length of the reading process and strained the eyes. To help improve the quality of life of visually impaired people, a tool to read the article is needed. The rate of visual impairment can vary in each visually impaired person. Therefore, a device developed in this work used other sensory functions to receive information from a text. The device has converted speech synthesis specially designed for visually impaired Indonesians so that they can easily use this device without having to ask others for help, and they can use this device to understand literature in the language Indonesian.MethodsText- The speech device consisted of three main modules, the image processing module, the word correction module and the speech processing modules. The image processing module sets the object position, camera focus and lighting, takes photos and converts the image to text. The word correction module makes corrections to the output image processing module to improve accuracy in matching Indonesian dictionary. The speech processing module transforms writing into sound and processes it withspecific physical characteristics so that sound can be understood. One element of this image processing module is OCR. When using OCR engine, initial state and steps are required in order to get the best OCR input to reduce the handicap of this OCR engine. The configuration state is well suited to the specifications of the desired initial device. For the desired result of this processing to have a minimum error rate, a short processing time is also required. This module does not modify the OCR algorithm, but gives additional state to get the best OCR input.IEEE OCR or optical character recognition is a technology that automatically recognizes the character through the optical mechanism, this technology imitates the ability of the senses of human sight, where the camera replaces the eye and the image processing is done in the computer engine replacing the human brain. [3]. Tesseract OCR is a type of OCR engine with matrix matching [4]. The choice of Tesseract engine is due to the fact that this machine has been widely accepted around the world, as well as the flexibility and expandability of these machines and the fact that many communities are active researchers to develop this OCR engine . The machines still have defects such as edge distortion and dim lighting effect, so it is still difficult for most OCR engines to achieve high-precision text [5]. It needs some support and some condition in order to get the minimal defect. System Specifications The device is designed based on the following restrictions: a. The reading distance range is 38-42cm. b. The maximum thickness of reading material is 3 cm. c. Minimum lighting is 250 lumen/m2 (environmental classes, office with easy work) d. The maximum tilt of the text line is 5 degrees from vertical. e. The maximum size of reading material is A4 or 210 x 297 mm f. The font size is at least 10 pts. g. Type the characters: Roman, Egyptian or sans serif. Hardware System Design The mount in Figure 2 is designed such that maximum A4 size paper can be fully captured by the camera. The distance between the camera and the object is 40 cm and a 15 cm long pole is added to position the camera above the center of the object. The Raspberry Pi camera module uses manual focus adjustment, so it is necessary to adjust the initial lens setting. . To make the input image sharper, good lighting conditions are required. Therefore, a series of LEDs are added to provide additional light if the environment has low light intensity. Tesseract OCR Implementation The input image captured by the camera has a size of 5 MPI (2592 x 1944 pixels) or 215 ppi (pixels per inch). Based on the Tesseract OCR engine specifications, the minimum character size that can be read is 20 pixels in uppercase. Tesseract OCR accuracy will decrease with 10pt font size. Software Design The software processes the input image and converts it to text format. The software implementation.Software design of image processing moduleThe image is taken by the user through a GPIO pin connected to the touch key using the interrupt function. Additionally, the photo is taken using the Raspistill program with sharpening mode to make the image sharper. The resulting image has a .jpg format with a resolution of 2592 x 1944 pixels. B. The word correction module Spell check Spell check is a task for predicting misspelled words in the document. This prediction can be displayed atthe user in different ways. Correction of the work is a work of replacing the misspelled word with the hypothesis of the correct spelling. The most appropriate approach is to directly model something that is causing the error and encode it into a modeling algorithm or error. The Damerau-Levenshtein edit distance was introduced as a way to detect spelling errors (Damerau, 1964). Phonetic indexing algorithms such as Metaphone, used by GNU Aspel (Atkinson, 2009), feature words with similar pronunciations (“soundslike” pronunciation) and allow the correction word to be different from the spelling word. Metaphone relies on a data file containing phonetic information. Linguistic intuition about the different causes of a spelling error can also be represented explicitly in the orthographic system (Deorowicz and Ciura, 2005). Almost all spelling systems currently use the lexicon (dictionary). Dictionaries have difficulty handling items that do not appear in the dictionary, such as nouns, foreign terms or neologisms, which can increase the proportion of terms that do not exist in the dictionary (Ahmad and Kondrak , 2005) [6].Word correctionThe module receives inputs from the image processing module in the form of text from the image processing module. The image processing module cannot define the truth or falsity of the output word, so the correction module of this word, the correction of whole words from image processing requires a module. In order to improve the accuracy of the output image processing module, design the word correction module. The word correction module includes several functions. In word correction software, there is one main function, which is the correct function. Other functions such as support function to adjust typing with Indonesian grammar. The correct function matches the input and correct it as well. The correct function uses an Indonesian dictionary (word list) as a reference to correct it. There are supporting functions to overcome the constraints on the use of numbers and dictionary name as described in the literature, such as: 1. The function to split the text into words. 2. The function to check the number in the text. 3. The function to check the capital letter at the beginning of the sentence. 4. The function to check the punctuation mark at the end of the sentence. 5. The function to check the name (uses capital letter) in the sentence. 6. The function allowing you to combine all the words from the previous execution. Design Implementation The implementation of the word correction module consists of: The first step is to organize the Indonesian language words to be used in the dictionary. The dictionary is used to compare each entry with the Indonesian language. The words in this dictionary come from the words that exist in KBBI (Kamus Besar Bahasa Indonesia). The number of words in this dictionary is the result of a reduction of up to 50,850 words. Number is a combination of base words, conjunctions, repeated words, absorption words, numerals, question words, pronouns, affixes, prefixes and suffixes. Compilation of the word correction moduleCorrection word module compiled by adapting the corrector made by Peter Norvig. In this word correction module, since common errors in output image processing usually occur in the letter and not in the length of the word, the correction function simply replaces the error word. This function will only replace the word if the length ofthe entry is equal to the length of the word in the dictionary. Using this replacement type also takes into account the computational load. If you only use one replacement function, because the word length is n and the edit distance is one, this will only happen at n-1 transpose distances. According to the literature on spelling correction, it states that 80% to 95% of spelling errors have an edit distance of one from the target. Based on research conducted by Peter Novrig on 270 spelling errors, it turns out that only 76% of them have an edit distance. Further research yielded good coverage; in test case 270, only three had a distance greater than two. This means that the input correction will include two letters in 98.9% of cases. Since the correction does not exceed two distance corrections, the optimization that can be done is simply to keep the substitution word that will be used by completely familiar words [7]. There is no general provision limiting the character differences that are corrected. However, based on previous research results and considering the computational load, a two-character limit is used for this correction function. This correction function uses probability-based methods that learn the input word so that the possible words that will be output in replacement of a corrected word depend on the frequency of occurrence of the word. The Text-to-Speech TTS (Text-to-Speech) speech processing module is a system that can convert text input into speech. Text-to-Speech basically consists of two subsystems which are: The text-to-phoneme converter is used to convert the sentence typed in a particular language as text into a series of codes usually represented by the sound of the codes of the phoneme, its duration and its pitch. This section is language dependent.Phoneme to Speech ConverterThe Phoneme to Speech Converter will accept input in the form of codes as well as the pitch and duration of the phonemes produced by the previous section. System Design Figure 5 shows the schematic of the speech processing module. 5. Design level 0 of the speech processing module Considering the use of the Linux platform, the availability of the Indonesian dialect and the simulation results in TTS, it was selected eSpeak and Google TTS for the TTS software. The specifications of the general function of the system to be realized are as follows: The output voice is in Indonesian dialect with a reading intelligibility tolerance percentage of 0.02%. There are additional features such as playing, stopping and pausing sound. Design Implementation The implementation diagram of the speech processing module. Python's standard library covers a wide range of modules. The speech processing module uses the OS package which provides file and process operations, the pygame package which provides sound playback functions, the RPi.GPIO package which provides a class to control the GPIO on a Raspberry Pi and a subprocess package that allows you to generate new processes. connect to their input/output/error channels and get their return codes. IsPause and isStop are variables that will be used for audio player functionality. These variables are initialized with the value False, which means that they have not been active. Parameter Setting the GPIO pin numbering according to the breakout board. The main program provides functions to retrieve and process the image d input, convert it into a sound signal. , and play, stop, pause or exit 2-3, 2015