{"id":168689,"date":"2023-06-16T16:15:15","date_gmt":"2023-06-16T15:15:15","guid":{"rendered":"https:\/\/liora.io\/en\/?p=168689"},"modified":"2026-02-06T09:01:32","modified_gmt":"2026-02-06T08:01:32","slug":"data-poisoning-a-threat-to-machine-learning-models","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/data-poisoning-a-threat-to-machine-learning-models","title":{"rendered":"Data Poisoning: a threat to Machine Learning models"},"content":{"rendered":"<style><br \/>\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style>\n<p><strong>Among the many computer attacks that exist and that attack IT systems, Data Poisoning is characterized by the falsification of training data for Machine Learning models. What does this mean? Does it represent a real danger? Here&#8217;s a brief overview of this particular attack, the threats it poses, and how to defend against it.<\/strong><\/p>\n<h3>What is data poisoning?<\/h3>\nData Poisoning attacks first appeared with the massive advent of <a href=\"https:\/\/liora.io\/en\/machine-learning-what-is-it-and-why-does-it-change-the-world\">Machine Learning models <\/a>at the end of the 20th century.&nbsp;\n\nThese attacks occur during the training phase of machine learning models. A machine learning model <b>needs to be trained<\/b> with data to function. Gradually, the machine learning model will<b> learn from its mistakes<\/b> and perform its task more and more accurately.\n\nA predictive model is a computer program that will be able to perform a particular task, such as recognizing\n\n<style><br \/>\n.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=\".svg\"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/style>\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"512\" height=\"275\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/06\/Data-poisoning2.jpg\" alt=\"Data-poisoning2\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/06\/Data-poisoning2.jpg 512w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/06\/Data-poisoning2-300x161.jpg 300w\" sizes=\"(max-width: 512px) 100vw, 512px\">\n\nBut the Data Poisoning attack, by acting on the training phase, will alter or even completely distort the results of the <b>predictive model<\/b>. The example of the attacks on <b>Google&#8217;s anti-spam system<\/b> between 2017 and 2018 shows just how they work. Google&#8217;s anti-spam model is trained with data known as input\/label pairs.\n\nThe input is an email or text message, and the label indicates whether the message is spam or not.\n\nThis is where the <b>Data Poisoning attack<\/b> comes in. It will corrupt and falsify this training data on a massive scale, indicating, for example, that a spam message is not spam. This attack will alter the accuracy of the machine-learning model. In Google&#8217;s case, spammers can then rub their hands together:They can send spam without Google&#8217;s anti-spam model notifying them. Data poisoning attacks can also <b>act on traffic sign recognition models<\/b>, used for autonomous cars, for example. If this model is poisoned, it could very well confuse a stop sign with a speed limit sign.\n\n<img decoding=\"async\" width=\"512\" height=\"237\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2023\/06\/Data-poisoning1.jpg\" alt=\"Data-poisoning1\" loading=\"lazy\">\n\nThis attack has become very accessible to even the smallest hacker. Previously, Data Poisoning attacks were <b>difficult to implement<\/b> because they required a lot of computing power, time, and money. But new techniques have made it possible to bypass these obstacles. The <b>TrojanNet backdoor technique<\/b> is particularly problematic. By creating a neural network that detects a series of patches, this technique does not require access to the original model and can be performed by a basic computer.\n<h3>What are the dangers of Data Poisoning?<\/h3>\nThe fact that a data poisoning attack has become so accessible makes it a real danger. Once the model training phase is over, it&#8217;s very difficult to <b>correct the machine learning model<\/b>. It would require a lengthy analysis of all the inputs that have trained the model, to detect fraudulent inputs and remove them. But if the mass of data is too large, this analysis is simply impossible. The only solution is to retrain the model.\n\nBut these training phases can be extremely costly: in the case of the <b>GPT-3 <\/b><a href=\"https:\/\/liora.io\/en\/artificial-intelligence-definition\">artificial intelligence system<\/a> developed by Open IA, the training phase cost around 16 million euros&#8230;\n\nData poisoning is not just an <b>economic cost<\/b>, it can also represent an <b>even greater danger<\/b>. Artificial intelligence and machine learning models are becoming increasingly important in our societies, and are being used for tasks of the utmost importance, such as healthcare, transport, and criminal investigations. For example, the Chicago police are using AI to fight crime, to predict where and when violent crimes will break out.&nbsp;\n\nWhat happens if the data in their models is poisoned? Crime-fighting becomes ineffective, and the models steer police officers in the wrong direction.\n<h3>How can we protect ourselves from data poisoning?<\/h3>\nFortunately, there are ways to combat data poisoning.&nbsp;\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">The first technique is to <b>check the databases<\/b> before injecting them into the model&#8217;s training data. This can be done using statistical methods to detect anomalies in the data, regression tests, or manual moderation.&nbsp;<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">You can also spot any drop in <b>model performance<\/b> during the training phase and react immediately, thanks to cloud tools such as <b>Azure Monitor<\/b> or <b>Amazon SageMaker<\/b>.&nbsp;<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Finally, as data poisoning requires prior knowledge of the model, it is important to keep the model&#8217;s operating information secret during the training phase.<\/li>\n<\/ul>\nData poisoning, therefore, represents a real IT threat, and all the more so as these attacks are becoming<b> increasingly accessible to hackers<\/b>. The challenge, however, is to keep pace with the technical progress made by hackers and to improve prevention systems. <a href=\"https:\/\/liora.io\/en\/data-scientist-salary\">Data Scientists<\/a> and <a href=\"https:\/\/liora.io\/en\/data-engineer-role-skills-salary\">Data Engineers<\/a> are on the front line in combating these attacks. They are the ones who will have to<b> collect secure data<\/b> or <b>detect attacks<\/b> during the training phases. If you&#8217;d like to find out more about how these models work and how to protect them, take a look at our training courses in the data professions.\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Start a training in Data Science<\/a><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Among the many computer attacks that exist and that attack IT systems, Data Poisoning is characterized by the falsification of training data for Machine Learning models. What does this mean? Does it represent a real danger? Here&#8217;s a brief overview of this particular attack, the threats it poses, and how to defend against it. What [&hellip;]<\/p>\n","protected":false},"author":74,"featured_media":168693,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2426],"class_list":["post-168689","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cybersecurity"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/168689","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/74"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=168689"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/168689\/revisions"}],"predecessor-version":[{"id":206394,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/168689\/revisions\/206394"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/168693"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=168689"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=168689"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}