{"id":174899,"date":"2023-12-09T18:09:00","date_gmt":"2023-12-09T17:09:00","guid":{"rendered":"https:\/\/liora.io\/en\/?p=174899"},"modified":"2026-02-06T08:42:49","modified_gmt":"2026-02-06T07:42:49","slug":"teaching-the-computer-to-read-with-optical-character-recognition","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/teaching-the-computer-to-read-with-optical-character-recognition","title":{"rendered":"Teaching the computer to read with Optical Character Recognition"},"content":{"rendered":"<p><strong>Optical Character Recognition (OCR), also known as ocherecognition, encompasses all the methods used to generate text files from images containing handwritten text. With the advent of digital technology and automation, OCR has become an indispensable tool, as images containing text cannot be processed by a computer.<\/strong><\/p>\t\t\n\t\t\t<h3>1. A little history<\/h3>\t\t\n\t\t<p>The<strong> first version of a machine<\/strong> capable of recognizing characters in a handwritten document was developed by German engineer Gustav Tauschek in 1929. The principle was as follows. A document containing handwritten text was placed in front of the machine&#8217;s window.<\/p><p>Then, to detect the correct character, a wheel with letter-shaped holes rotated until a photodetector ensured that the image of the character and the hole coincided perfectly. Once the character had been detected, it was written on a sheet of paper.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/09\/ocr.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t<figcaption>Gustav Tauschek reading machine<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t<h3>2. Using Deep Learning for OCR<\/h3>\t\t\n\t\t<p>It&#8217;s easy to see that<strong> scanning with mechanical machines,<\/strong> however powerful, can be very costly and time-consuming. Companies wishing to automate their digitization pipeline need a fast, efficient handwriting <a href=\"https:\/\/liora.io\/en\/mushroom-recognition\">recognition tool.<\/a> Thanks to the <a href=\"https:\/\/liora.io\/en\/deep-learning-python-unveiling-the-basics\">development of Deep Learning<\/a>, particularly in the field of image processing, there are now algorithms capable of solving OCR problems. The OCR model can be broken down into 2 stages: text detection and recognition.<\/p><p>The first is the process of detecting text zones in a document, which can be a very <strong>complicated task,<\/strong> as documents can have very different structures. Text zones will not be in the same place between a novel and a newspaper, for example. <br>The second is to recognize text for a zone of the document containing it.<\/p>\t\t\n\t\t\t<h3>3. Text detection using Deep Learning<\/h3>\t\t\n\t\t<p>There are several methods for detecting text within a document. Some are inspired by object detection models (Single Shot Box Detection, <strong>Faster-RCNN) u<\/strong>sed to detect faces, cars, etc. in an image. This type of model returns a Bounding Box, which is simply a frame surrounding the object to be detected. To detect text, these models are modified and adapted.<\/p><p>One example is the TextBoxes model, which is based on the <strong>SSD (Single Short Detector)<\/strong> model, but adds Bounding Boxes that are more specific to text detection. The template returns image areas containing text.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/www.google.com\/url?q=https:\/\/www.researchgate.net\/figure\/The-architecture-of-TextBoxes-a-29-layer-fully-convolutional-network-including-13_fig3_322355323&amp;sa=D&amp;source=docs&amp;ust=1663177989034716&amp;usg=AOvVaw2f0LVJREyJPzCjZ3-7EQOi\">\n\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/10\/Ex-architecture-modele-reconnaissance-texte-OCR-1.jpg\" title=\"\" alt=\"architecture modele de reconnaissance de texte\" loading=\"lazy\">\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t\t\t<figcaption>Example of the architecture of a text recognition model<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>There are also other detection methods based on segmentation. <strong>Segmentation<\/strong> is a process that breaks down an image into a number of zones, called classes, containing similar elements. Here&#8217;s an example:<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/www.google.com\/url?q=https:\/\/makina-corpus.com\/societe\/partenariat-avec-lecole-polytechnique-federale-de-lausanne-pour-developper-un-outil-de&amp;sa=D&amp;source=docs&amp;ust=1663177989031206&amp;usg=AOvVaw3mO5Xe4eQBwjiEwbJYjIPo\">\n\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/09\/ocr2.png\" title=\"\" alt=\"\" loading=\"lazy\">\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t\t\t<figcaption>Example of image segmentation into several classes (car, sidewalk, people, trees, &#8230;)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p><a href=\"https:\/\/liora.io\/en\/convolutional-neural-network-everything-you-need-to-know\">Convolutional neural networks (CNNs)<\/a> and pyramidal neural networks (PNNs) are often used for text segmentation. Here&#8217;s the architecture of a high-performance model called TextSnake.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/www.google.com\/url?q=https:\/\/github.com\/princewang1994\/TextSnake.pytorch&amp;sa=D&amp;source=docs&amp;ust=1663177989032936&amp;usg=AOvVaw0fOc0VjJpdcTIgPDfPbGPe\">\n\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/10\/TextSnake-OCR-1.jpg\" title=\"\" alt=\"TextSnake\" loading=\"lazy\">\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t\t\t<figcaption>Architecture and segmentation example of the TextSnake model<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t<h3>4. Text recognition using Deep Learning<\/h3>\t\t\n\t\t<p>We have just seen the first stage of an occlusion model, which is the detection of areas of the document containing text. Once this has been achieved, the next step is to recognize the characters that make up these areas, in order to deduce words and thus translate the handwritten document into a usable text format.<\/p><p>The <a href=\"https:\/\/liora.io\/en\/cracking-the-code-on-underfitting-in-machine-learning\">Machine Learning models<\/a> most commonly used to solve text recognition problems are recurrent neural networks. Their advantage is that the hidden layers that make them up share a common memory band, which means that prediction is influenced by previous predictions. If you&#8217;d like to find out more about how they work, a full article on recurrent neural networks is available on our blog.<\/p><p>The most commonly used layers for <a href=\"https:\/\/liora.io\/en\/recurrent-neural-network-what-is-it\">recurrent neural network<\/a> models are LSTMs and GRUs. We add convolutional layers to these recurrent layers. Convolution is used to extract relevant, local features from images containing text, and recurrent layers are used to assign labels corresponding to the characters present in the image. For each feature created by convolution, a probability vector is associated with each label.<\/p><p>Text recognition can be summarized as follows:<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/10\/reconnaissance-text-dl-OCR-1.jpg\" title=\"\" alt=\"reconnaissance de texte OCR\" loading=\"lazy\">\t\t\t\t\t\t\t\t\t\t\t<figcaption><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t<p>Don&#8217;t forget that there must be a text detection part upstream for the recognition part to work correctly.<\/p><p>In this example, we can see that the input image represents the text &#8220;a b&#8221;. The<br>convolution part will decompose the image into a feature map. Here we can see that the image has been split into 5 sequences. The recurrent layers then associate a class probability with each sequence. There are as many classes as possible characters (letters of the alphabet, special characters, etc.). For each sequence, we choose the class that maximizes the probability, obtain a sequence of characters and then deduce our final text.<\/p><p>However, this model does not allow us to check that the word or sentence returned in output is without spelling mistakes, as we predict character by character without worrying about this. There are other models that implement a spelling correction system. This approach is called a language model, and can improve the accuracy of our model.<\/p>\t\t\n\t\t\t<h3>5. Conclusion<\/h3>\t\t\n\t\t<p>We&#8217;ve just seen a <strong>Deep Learning<\/strong> model for solving the occlusion problem. Deep Learning has really revolutionized this field, offering highly accurate predictions in contrast to models that had been developed previously. We can see that the model uses techniques normally used in other fields, such as object<a href=\"https:\/\/liora.io\/en\/nlp-training-become-an-nlp-pro-and-master-the-art-of-natural-language-processing\"> detection in an image or Natural Language Processing<\/a>, and that these techniques work very well for occlusion.<\/p><p>If you enjoyed this article and would like to discover other Deep Learning methods, you are welcome to join our Deep Learning expert course.<\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/deep-learning\">Discover our Deep Learning course<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Optical Character Recognition (OCR), also known as ocherecognition, encompasses all the methods used to generate text files from images containing handwritten text. With the advent of digital technology and automation, OCR has become an indispensable tool, as images containing text cannot be processed by a computer. 1. A little history The first version of a [&hellip;]<\/p>\n","protected":false},"author":76,"featured_media":174900,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-174899","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/174899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/76"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=174899"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/174899\/revisions"}],"predecessor-version":[{"id":206188,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/174899\/revisions\/206188"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/174900"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=174899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=174899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}