{"id":178327,"date":"2024-03-06T22:10:12","date_gmt":"2024-03-06T21:10:12","guid":{"rendered":"https:\/\/liora.io\/en\/?p=178327"},"modified":"2026-02-17T15:28:47","modified_gmt":"2026-02-17T14:28:47","slug":"the-unit-test-in-data-analysis","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/the-unit-test-in-data-analysis","title":{"rendered":"The Unit Test in data analysis"},"content":{"rendered":"\n<p class=\"is-style-h2\">Definition of Unit Test<\/p>\n\n\n\n<p>In <strong>computer programming,<\/strong> unit testing is the procedure used to ensure that software or source code (or part of software or part of code) functions correctly. Having long been considered a secondary task, unit testing has become a common and even essential practice, now an integral part of the standard development lifecycle.<\/p>\n\n\n\n<p><strong>Unit testing<\/strong> is an essential part of any source code to ensure that the application works as intended in the specification, despite any changes to the source code. In addition, good software development practices such as <strong>Test Driven Development (TDD)<\/strong> and <a href=\"https:\/\/liora.io\/en\/become-a-devops-engineer-tasks-and-skills\">DevOps<\/a> rely heavily on unit testing.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"\/en\/courses\/data-ai\/\">Discover our training courses<\/a><\/div>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"the-role-and-criteria-of-unit-testing\">The role and criteria of unit testing<\/h2>\n\n\n\n<p><strong>Unit tests<\/strong> are developed to compare the final version delivered with the specification normally provided in the early phases of an<strong> IT project.<\/strong> They are used to determine whether or not a verification has been successful. A test can therefore be understood as a criterion for validating the expectations of a <strong>programme or computer code.<\/strong><\/p>\n\n\n\n<p>Mike Cohn, one of the main contributors to the Agile Scrum methodology, places unit tests at the base of his Test Pyramid (in which the next levels correspond to service and interface tests).<\/p>\n\n\n\n<p>Among other things, unit testing makes it possible to<\/p>\n\n\n\n<p><strong>Quickly identify errors:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Several methods (including the eXtreme Programming method) recommend writing tests at the same time as, or even before, the function to be tested, to facilitate debugging.<\/li>\n<\/ul>\n\n\n\n<p><strong>Facilitate maintenance:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>After a code modification, tests can be used to quickly identify any regressions (failed tests).<\/li>\n<\/ul>\n\n\n\n<p><strong>Complete code documentation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tests can often be used to highlight the use of a function. In this way, they can be used to supplement feature documentation.<\/li>\n<\/ul>\n\n\n\n<p><strong>A number of criteria must be met in order to properly define a unit test:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unit:<\/strong> Unit testing focuses on the smallest identifiable element of an application (referred to as a unit). However, several elements or lines of code can be used to define a unit, which can then be a function, a class method, a module, an object, etc. This is why unit tests are called low-level tests (as opposed to high-level tests, which check the validity of one or more more more complex functionalities).<\/li>\n\n\n\n<li><strong>White-box testing:<\/strong> The test must be carried out in a white box to reflect the fact that the quality engineer often in charge of the test or the developer himself must have knowledge of the part of the code (of the unit) to be tested.<\/li>\n\n\n\n<li><strong>Isolation:<\/strong> The tests must be carried out in isolation, because each unit tested must be independent of the other tests. The suite of unit tests must be able to be run in any order without affecting the results of the following or previous tests.<\/li>\n\n\n\n<li><strong>Speed:<\/strong> the various tests must be able to run quickly, so that they can be launched whenever the source code is modified.<\/li>\n\n\n\n<li><strong>Idempotence:<\/strong> idempotence means that an operation has the same effect each time it is applied. The test must therefore be independent of the environment or the number of times it is run.<\/li>\n\n\n\n<li><strong>Automated:<\/strong> The unit test must be able to produce a direct result in terms of success or failure and not require manual intervention by the developer to conclude.<\/li>\n<\/ul>\n\n\n\n<p>To sum up, we can say that the <strong>real value of testing<\/strong> is appreciated when major changes are made to the source code of an application or system. Testing then helps to ensure that the expected behaviour is maintained. Unit testing therefore helps to reduce the level of uncertainty.<\/p>\n\n\n\n<p>While testing methodologies and frameworks are now well established in software development, how and where does unit testing fit in with data science?<\/p>\n\n\n\n<p>Related articles:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><a href=\"https:\/\/liora.io\/en\/folium-discover-the-open-source-python-library\" target=\"_blank\" rel=\"noopener\">Folium: Discover the open source Python library <\/a><\/td><\/tr><tr><td><a href=\"https:\/\/liora.io\/en\/matplotlib-master-data-visualization-in-python\" target=\"_blank\" rel=\"noopener\">Matplotlib: Master Data Visualization in Python <\/a><\/td><\/tr><tr><td><a href=\"https:\/\/liora.io\/en\/python-all-you-need-to-know\" target=\"_blank\" rel=\"noopener\">Python Crash Course: Get started <\/a><\/td><\/tr><tr><td><a href=\"https:\/\/liora.io\/en\/machine-learning-python-where-to-start\" target=\"_blank\" rel=\"noopener\">Mastering Machine Learning in Python: Data-Driven Success <\/a><\/td><\/tr><tr><td><a href=\"https:\/\/liora.io\/en\/python-programming-for-beginners-episode-3\" target=\"_blank\" rel=\"noopener\">Python Programming for Beginners &#8211; Episode 3<\/a><\/td><\/tr><tr><td><a href=\"https:\/\/liora.io\/en\/django-all-about-the-python-web-development-framework\" target=\"_blank\" rel=\"noopener\">Django: All about the Python web development framework<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"\/en\/courses\/data-ai\/\">Discover our training courses<\/a><\/div>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"machine-learning-unit-tests\">Machine learning unit tests<\/h3>\n\n\n\n<p>In <a href=\"https:\/\/liora.io\/en\/software-engineer-everything-you-need-to-know-about-this-role-2\">traditional software development,<\/a> the key element that influences the behaviour of the system is the source code. Machine Learning technologies have introduced 2 new elements that also have a direct impact on the system&#8217;s behaviour. These are the model used and the data. Furthermore, the behaviour of Machine Learning models is not specifically defined by lines of code, but mainly by the data used in the training phase.<\/p>\n\n\n\n<p>This is why systems based on <a href=\"https:\/\/liora.io\/en\/automl-and-machine-learning-automation-a-threat-to-data-scientists\">Machine Learning technologies<\/a> require additional testing, because the rules governing the behaviour of the overall system are less explicit than the rules of traditional systems.&lt;br&gt;.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=&#8221;.svg&#8221;]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized is-style-not-rounded\" style=\"margin-top:var(--wp--preset--spacing--columns);margin-bottom:var(--wp--preset--spacing--columns)\"><img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/05\/schema_1.png\" alt=\"test unitaire\" style=\"width:1000px;height:auto\" title=\"\" \/><\/figure>\n\n\n\n<p><a href=\"https:\/\/liora.io\/en\/machine-learning-engineer-all-about-the-job\">Machine Learning<\/a> work loops very often straddle 3 environments:<\/p>\n\n\n\n<p><strong>The research and development environment:<\/strong><\/p>\n\n\n\n<p>This is most often the <a href=\"https:\/\/liora.io\/en\/jupyter-notebook-an-indispensable-code-sharing-tool\">Jupyter Notebook<\/a> used to develop the various models and approaches. However, while it is technically possible to build and run unit tests on basic functionalities (of a class, for example, in <a href=\"https:\/\/liora.io\/en\/ipython-discover-the-python-shell-at-the-heart-of-jupyter-notebook\">Python)<\/a>, most unit tests are generally run in the development and production environments.<\/p>\n\n\n\n<p><strong>The development environment:<\/strong><\/p>\n\n\n\n<p>This is the environment in which most tests are carried out. There are generally several levels of testing in this environment:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unit tests<\/strong>, which form the basis (as illustrated in the diagram below)<\/li>\n\n\n\n<li><strong>Integration tests <\/strong>to ensure that the various components function optimally<\/li>\n\n\n\n<li><strong>System tests<\/strong> to ensure that the overall system functions correctly<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized is-style-not-rounded\" style=\"margin-top:var(--wp--preset--spacing--columns);margin-bottom:var(--wp--preset--spacing--columns)\"><img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/05\/schema_2.png\" alt=\"test unitaire\" style=\"width:auto;height:350px\" title=\"\" \/><\/figure>\n\n\n\n<p>The <strong>question then arises of where to position the cursor for defining the number of tests<\/strong>. To define the optimum number of tests, we can use the following questions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Does the test base make it possible to reduce uncertainty about the operation of our system?<\/li>\n\n\n\n<li>What are the critical and main tasks of our system?<\/li>\n\n\n\n<li>Are the priorities clearly identified?<\/li>\n<\/ul>\n\n\n\n<p>With regard to <a href=\"https:\/\/liora.io\/en\/machine-learning-what-is-it-and-why-does-it-change-the-world\">machine learning issues<\/a>, examples of unit tests that can be carried out are given below:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Testing the format and type of data<\/li>\n\n\n\n<li>Testing the parameters of the <a href=\"https:\/\/liora.io\/en\/scipy-all-about-the-python-machine-learning-library\">Machine Learning model<\/a><\/li>\n\n\n\n<li>Testing input variables<\/li>\n\n\n\n<li>Testing model quality<\/li>\n\n\n\n<li>The production environment: This is the environment in which users of an application will have access to it. In this environment, the tests carried out to ensure that the overall system is working properly are generally much more complex.<\/li>\n<\/ul>\n\n\n<h2 class=\"wp-block-heading\" id=\"unit-test-libraries-in-python\">Unit Test libraries in Python<\/h2>\n\n\n\n<p><a href=\"https:\/\/liora.io\/en\/tpot-all-about-this-machine-learning-python-library\">In Python, the two main libraries<\/a> used to run unit tests are Unittest and Pytest. Using the unittest library to run tests requires a good knowledge of Python&#8217;s object-oriented programming, because the tests are run via a class.<\/p>\n\n\n\n<p>In contrast, using the <a href=\"https:\/\/liora.io\/en\/pytest-for-python-why-and-how-to-use-it\">Pytest library<\/a> offers a little more flexibility in the design and implementation of tests. In addition to the fact that running tests with the Pytest library is much more flexible, the Pytest library offers more test commands and options (assert instruction).<\/p>\n\n\n\n<p>To illustrate simple use cases for these two libraries, we&#8217;ll look at the following two examples:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Test of a function calculating the square of an integer: We will test that the values given as output are integers (test carried out with Pytest).<\/li>\n\n\n\n<li>Testing the type and ranges of input data<strong> (using unittest)<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Although far from exhaustive, these two examples will give you a better idea of some of the ways in which these libraries can be used.<\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"example-of-a-unit-test-with-pytest\">Example of a Unit test with Pytest<\/h2>\n\n\n\n<p>Suppose, for example, that as part of a project, developers have created a function to calculate the square of an integer, and we want to run tests to ensure that the function&#8217;s outputs are integers. So they decide to run tests on a list of values.<\/p>\n\n\n\n<p><strong>Step 1:<\/strong> Define the number_squared function in function.py<\/p>\n\n\n\n<p><strong>Step 2:<\/strong> Defining the test function in the test_in_int.py file<\/p>\n\n\n\n<p>Executing the pytest command in a terminal returns the following result:<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\" style=\"margin-top:var(--wp--preset--spacing--columns);margin-bottom:var(--wp--preset--spacing--columns)\"><img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/05\/test-unitaire-ecran.png\" alt=\"test unitaire\" style=\"width:1000px;height:auto\" title=\"\" \/><\/figure>\n\n\n\n<p>The only constraint imposed by the use of the Pytest library is that the name of the test file must begin with the prefix &#8216;test_&#8217; (the full name would be test_file_name.py). In this way, whenever the function is modified or updated, the test function can be used to ensure that the expected behaviour of the function remains consistent.<\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"example-of-a-test-with-unittest\">Example of a test with unittest<\/h2>\n\n\n\n<p>In this example, we&#8217;re going to use the <strong>unittest library<\/strong> to run a test with an example dataset. We will use the iris dataset (click here for more details). The test we want to implement is to check the type and extent of the input data for a Machine Learning model.<\/p>\n\n\n\n<p>Let&#8217;s imagine, for example, that the Data Scientist in a team has developed a model for classifying three species of flowers. However, the data (precisely the variables in our dataset) used to develop the model were of a certain type and spanned a certain interval.<\/p>\n\n\n\n<p>If, by any chance, the information provided by users of our model were very different from the training data (known in English as data drift), the Machine Learning model put into production could no longer be relevant and provide inconsistent predictions. This is why it is necessary during the development phase to define a test to ensure that the input data is consistent with the training data.<\/p>\n\n\n\n<p>From the overview and description of the dataset below, we can see that the variables are floats (real numbers) with values generally between 1 and 8.<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/05\/apercu-du-jeu-de-donnees.png\" alt=\"aper\u00e7u du jeu de donn\u00e9es\" style=\"width:1000px;height:auto\" title=\"\" \/><figcaption class=\"wp-element-caption\">Overview of the dataset<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/05\/Description-du-jeu-de-donnees.png\" alt=\"description du jeu de donn\u00e9es\" style=\"width:1000px;height:auto\" title=\"\" \/><figcaption class=\"wp-element-caption\">Description of the data set<\/figcaption><\/figure>\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-definition-of-a-dictionary-of-variable-types-and-ranges\">Step 1: Definition of a dictionary of variable types and ranges<\/h3>\n\n\n\n<p>In a dictionary structure, we will define both the type and the range of each variable in line with the description of the dataset above (Analogy with the definition domain in mathematics).<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-creation-of-a-data-processing-and-training-pipeline-for-a-logistic-regression-model\">Step 2: Creation of a data processing and training pipeline for a logistic regression model<\/h3>\n\n\n\n<p>This stage can be broken down into several phases, but we have grouped all the phases together in a single class.<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-defining-the-test-class-for-our-test-campaign\">Step 3: Defining the test class for our test campaign<\/h3>\n\n\n\n<p>With the unittest library, you need to define a test class:<\/p>\n\n\n<h3 class=\"wp-block-heading\" id=\"stage-4-running-the-unit-tests\">Stage 4: Running the unit tests<\/h3>\n\n\n\n<p>To run the test inside a Jupyter Notebook, simply use the code below:<\/p>\n\n\n\n<p>Running the code gives us the following display:<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/liora.io\/app\/uploads\/2022\/05\/test-unitaire-final.png\" alt=\"test unitaire\" style=\"width:1000px;height:auto\" title=\"\" \/><\/figure>\n\n\n<h2 class=\"wp-block-heading\" id=\"conclusion-about-unit-tests\">Conclusion about Unit Tests<\/h2>\n\n\n\n<p>Unit testing is an essential step in reducing uncertainty and ensuring consistency between the input specification and the final product. It&#8217;s easy to see why this stage is so <a href=\"https:\/\/liora.io\/en\/agile-coach-roles-skills-training-career-opportunities\">important in agile methods.<\/a><\/p>\n\n\n\n<p>In the development process for Machine Learning applications, unit tests are most often carried out during the development phase by a Data Scientist or Data Engineer to guarantee the system&#8217;s performance and ensure that the results obtained are consistent with the input data.<\/p>\n\n\n\n<p>If you&#8217;re interested in <strong>becoming a Data Engineer,<\/strong> find out more about the career path designed by Liora on the dedicated page.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"\/en\/courses\/data-ai\/\">Discover our training courses<\/a><\/div>\n<\/div>\n\n\n\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What is a unit test?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"A unit test is a software testing method that consists of verifying the proper functioning of a specific and isolated part of a program, usually a function or a method. In data analysis, unit tests are used to ensure that data transformations, calculations, and processing steps behave as expected. By validating each component independently, developers and data analysts can detect errors early and ensure the reliability of analytical workflows.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Why use unit testing in data analysis?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Unit testing in data analysis helps guarantee the accuracy and consistency of data processing pipelines. Analytical projects often involve multiple transformations, aggregations, and modeling steps, which can introduce errors if not properly validated. By implementing unit tests, teams can prevent regressions, improve code quality, and ensure that results remain reliable when modifications are made to the codebase.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How to implement unit tests in data analysis?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"To implement unit tests in data analysis, each function or processing step should be tested with controlled input data and expected outputs. This involves creating test cases that cover normal scenarios, edge cases, and potential failure situations. Automated testing frameworks allow analysts to systematically execute these tests and quickly identify discrepancies between expected and actual results.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Tools for unit testing in data analysis\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Several tools support unit testing in data analysis environments. In Python, frameworks such as unittest and pytest are widely used to create and run tests. For data validation specifically, libraries like Great Expectations help ensure data quality. These tools facilitate automated testing, continuous integration, and systematic validation of analytical workflows.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Conclusion\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Unit testing is a fundamental practice in data analysis projects. By validating each component of a data pipeline, analysts can improve code reliability, prevent errors, and maintain trust in analytical results. Integrating unit tests into the development process strengthens overall data governance and ensures more robust and maintainable analytical systems.\"\n      }\n    }\n  ]\n}\n<\/script>\n\n","protected":false},"excerpt":{"rendered":"<p>Definition of Unit Test In computer programming, unit testing is the procedure used to ensure that software or source code (or part of software or part of code) functions correctly. Having long been considered a secondary task, unit testing has become a common and even essential practice, now an integral part of the standard development [&hellip;]<\/p>\n","protected":false},"author":50,"featured_media":178329,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2434],"class_list":["post-178327","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-dev"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/178327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/50"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=178327"}],"version-history":[{"count":4,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/178327\/revisions"}],"predecessor-version":[{"id":207072,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/178327\/revisions\/207072"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/178329"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=178327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=178327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}