{"id":185048,"date":"2024-06-11T14:20:59","date_gmt":"2024-06-11T13:20:59","guid":{"rendered":"https:\/\/liora.io\/en\/?p=185048"},"modified":"2026-02-06T07:58:23","modified_gmt":"2026-02-06T06:58:23","slug":"ci-cd-for-kubeflow","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/ci-cd-for-kubeflow","title":{"rendered":"CI\/CD for Kubeflow (part 2) <\/br><i><small>by Tony Ruiz, GCP Customer Engineer at Google<\/small><\/i>"},"content":{"rendered":"This is the second part of a series&nbsp;\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/liora.io\/en\/introduction-to-kubeflow-for-mlops\">Part 1 here<\/a><\/li>\n<\/ul>\n<style><br \/>\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style>\n<h3>Motivation<\/h3>\nThe previous article discusses the complexities of the <a href=\"https:\/\/liora.io\/en\/bagging-machine-learning-what-is-it-about\">machine learning life cycle<\/a> and how <strong>Vertex Pipelines<\/strong>, a managed version of Kubeflow on Google Cloud, can address operationalizing a machine learning model. The next step in the journey is to automate the training and deployment of the system in a manner that facilitates and enables reproducibility and time to market. In traditional software development, this capabilities would be enabled by a system that allows for&nbsp;\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><strong>Continuous Integration (CI)<\/strong> &#8211; the ability to test and validate checked in code that needs to be merged into a central repository<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><strong>Continuous Deployment (CD)<\/strong> &#8211; Creating and deploying the system to an appropriate environment<\/li>\n<\/ul>\nSince machine learning models are dependent on the data, it must be accounted for in the <a href=\"https:\/\/liora.io\/en\/all-about-ci-cd\">CI\/CD process<\/a>. Particularly,\n<ul>\n \t<li>Data must be tested and validated for quality and <a href=\"https:\/\/developers.google.com\/machine-learning\/guides\/rules-of-ml#rule_37_measure_trainingserving_skew\">training\/serving skew<\/a><\/li>\n \t<li>A model must be continuously trained on new data to account for changes in the world<\/li>\n \t<li>The model must be validated, tested and deployed to a prediction service<\/li>\n<\/ul>\n<style><br \/>\n.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=\".svg\"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/style>\n<figure>\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"800\" height=\"240\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image1-1024x307.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image1-1024x307.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image1-300x90.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image1-768x230.png 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image1.png 1313w\" sizes=\"(max-width: 800px) 100vw, 800px\"><figcaption>Image from <a href=\"https:\/\/cloud.google.com\/architecture\/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build\">Architecture for MLOps using TensorFlow Extended, Vertex AI Pipelines, and Cloud Build<\/a><\/figcaption><\/figure>\nThis also introduces the notion of continuous training. Depending on the use case, we need our model to be trained and updated on a frequency.\n<h3>Purpose<\/h3>\nThe goal of this article is to serve as a gentle introduction to <strong>CI\/CD for the Vertex Pipelines<\/strong> created in the previous article. To address all the components in a comprehensive CI\/CD workflow in the image above would go beyond the scope of a short blog. This article assumes the reader has foundational knowledge on concepts surrounding git, docker, and cloud technologies.\n<h3>What is Cloud Build ?<\/h3>\nCloud build is a <strong>fully managed, serverless CI\/CD platform<\/strong>. Other potential include Jenkins, <a href=\"https:\/\/liora.io\/en\/azure-devops-definitions-devops-methods\">Azure DevOps<\/a> and Github Actions. A CI\/CD platform allows for teams to effectively and iteratively develop and ship software to different environments. This tutorial will use Google Cloud Build alongside a github repository to build and run containers using the code that was pushed to a main branch.\n<h3>A Minimalistic CI\/CD\/CT Workflow<\/h3>\n<h4>Prerequisites<\/h4>\n<ul>\n \t<li>A github account and repository\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"2\">This article assumes the reader has some foundational knowledge in git. <a href=\"https:\/\/www.atlassian.com\/git\/tutorials\/setting-up-a-repository\">https:\/\/www.atlassian.com\/git\/tutorials\/setting-up-a-repository<\/a><\/li>\n<\/ul>\n<\/li>\n \t<li>Google cloud environment<\/li>\n \t<li style=\"list-style-type: none;\">\n<ul>\n \t<li>If you are new to Google Cloud, refer to <a href=\"https:\/\/cloud.google.com\/docs\/get-started\">this page<\/a> to get started. In this example, we are going to be developing our pipeline on a <a href=\"https:\/\/cloud.google.com\/vertex-ai-notebooks?hl=en\">Vertex AI Notebook<\/a>.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\nRecall from the previous post, that the goal was to build an end-to-end ML workflow where&nbsp;\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">A model is trained&nbsp;<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Evaluated against a metric&nbsp;<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Deployed to a computer endpoint where predictions can be made<\/li>\n<\/ul>\nNote that in some instances, the deployment workflow may be separate from the training workflow. For example, you may want your machine learning model to be trained on a weekly basis but go through an extensive quality assurance process before deploying.\n<h4>Rewrite our pipeline workflow<\/h4>\nThe first thing we will need to do is <strong>transform the<\/strong> <strong>python notebook file (.ipynb)<\/strong> <strong>into a normal python file (.py)<\/strong>.&nbsp; This python file contains all the executable code to recreate a compiled pipeline file that will actually be used to execute our machine learning workflow. This is because python files work much better with text files than with mixtures of text and binaries, which is what notebook files are. Refer to the github repository here for the full code.\n\n<a href=\"\/en\/courses\/data-ai\/\">\nBrowse our training courses\n<\/a>\n<h4>Create a Dockerfile<\/h4>\nThe next step is to create a Dockerfile that contains our python file <a href=\"\/\">pipeline.py<\/a> and our requirements.txt. Note that we are creating a separate folder (.\/kpi-cli) within our workspace to house the files for our docker image.\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>%%writefile kfp-cli\/Dockerfile\nFROM gcr.io\/deeplearning-platform-release\/base-cpu\nWORKDIR \/kfp-cli\nADD pipeline.py .\/pipeline.py\nADD requirements.txt .\/requirements.txt\nRUN pip install -r requirements.txt\nCMD python .\/pipeline.py<\/xmp>\n\t\t\t\t<\/code>\n<\/pre>\n<h4>Build a local image and push it to Container Registry<\/h4>\nNow that a Dockerfile is defined and the dependencies exist within a directory, create a local image and push it to <a href=\"https:\/\/cloud.google.com\/artifact-registry\">Artifact Registry<\/a>.\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>IMAGE_NAME='kfp-truiz-mlops-v2'\nTAG='latest'\nIMAGE_URI='gcr.io\/{}\/{}:{}'.format(PROJECT_ID, IMAGE_NAME, TAG)\n!gcloud builds submit --timeout 15m --tag {IMAGE_URI} .\/kfp-cli<\/xmp>\n\t\t\t\t<\/code>\n<\/pre>\nThe cloudbuild product page within the console will contain the information on the build.\n<h4>Build a cloudbuild.yaml file<\/h4>\nThe next step is to create the <strong>cloudbuild.yaml file<\/strong>. This yaml file will contain the steps to execute the build when code gets pushed to the remote repository. For more information on creating the cloudbuild, refer to the <a href=\"https:\/\/cloud.google.com\/build\/docs\/configuring-builds\/create-basic-configuration\">documentation<\/a>.&nbsp;\n\nIn short, this will do a few things\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Build the docker image using the code in the repository. If there are errors in code, this build step will fail<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Push the most recent version of the docker image back to Artifact Registry<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">Use the image that was created to run the python script in the repository. The output of this execution will be the pipeline yaml file&nbsp;<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">The next step is to write the pipeline yaml file to a google cloud storage bucket. This yaml file can then be picked up by a scheduler to trigger the pipeline.<\/li>\n<\/ul>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>steps:\n  # Step 1: Build the Docker image\n  - name: 'gcr.io\/cloud-builders\/docker'\n    args: ['build', '-t', 'gcr.io\/$_PROJECT_ID\/kfp-truiz-mlops-v2:latest', '.\/kfp-cli']\n  # Step 2: Push the image back to container registry \n  - name: 'gcr.io\/cloud-builders\/docker'\n    args: ['push', 'gcr.io\/$_PROJECT_ID\/kfp-truiz-mlops-v2:latest']\n  # Step 3: Run the Python file within the container\n  - name: 'gcr.io\/$_PROJECT_ID\/kfp-truiz-mlops-v2:latest'\n    args: ['python', '.\/kfp-cli\/pipeline.py']  \n    env:\n    - 'PROJECT_ID=$_PROJECT_ID'\n    - 'REGION=$_REGION'\n  # Step 4: Write results to Google Cloud Storage\n  - name: 'gcr.io\/cloud-builders\/gsutil'\n    args: ['cp', '.\/xgb-pipeline.yaml', 'gs:\/\/$_PROJECT_ID-cloudbuild-pipelines'] \noptions:\n    logging: CLOUD_LOGGING_ONLY\nsubstitutions:\n   _PROJECT_ID: '<PROJECT_ID>'\n   _REGION: 'us-central1'<\/xmp>\n\t\t\t\t<\/code>\n<\/pre>\n<h4>Submit a local job to the Cloud Build<\/h4>\nOnce the docker file is defined and the cloudbuild yaml created, a local build can be submitted using gcloud command line interface. This:\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>!gcloud builds submit .\/kfp-cli --config cloudbuild.yaml<\/xmp>\n\t\t\t\t<\/code>\n<\/pre>\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Discover our training courses<\/a><\/div><\/div>\n\n<h4>Connect Google Cloud to our Github Repository<\/h4>\nNow that the cloudbuild file is defined, there needs to be an integration between the cloudbuild environment and the git repository. Fortunately this is pretty simple and straightforward to set up <a href=\"https:\/\/cloud.google.com\/build\/docs\/automating-builds\/github\/connect-repo-github#connecting_a_github_host\">via the console<\/a>.&nbsp;\n<p style=\"padding-left: 40px;\">The first step is <strong>to create a host connection to the git environment<\/strong>. This will in turn handle the authentication mechanism to the repository. You will be redirected to your github repository to allow the connection. An authentication token from GitHub will be created and stored in this project as a Secret Manager secret. You may revoke access to Cloud Build through GitHub at any time.&nbsp;<\/p>\n<p style=\"padding-left: 40px;\">The second step is <strong>to link the repository<\/strong>. You will need to <a href=\"https:\/\/github.com\/marketplace\/google-cloud-build\">install the Google Cloud build<\/a> application within Github to allow the integration to the selected repository.<\/p>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1342\" height=\"1313\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image3.png\" alt=\"\" loading=\"lazy\">\n\nAfter that step is complete, you will be able to link the repository to your Google Cloud Build host connection.\n\n<img decoding=\"async\" width=\"988\" height=\"356\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image2.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image2.png 988w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image2-300x108.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image2-768x277.png 768w\" sizes=\"(max-width: 988px) 100vw, 988px\">\n<h4>Create a Cloud Build Trigger<\/h4>\n<a href=\"https:\/\/cloud.google.com\/build\/docs\/automating-builds\/create-manage-triggers\">https:\/\/cloud.google.com\/build\/docs\/automating-builds\/create-manage-triggers<\/a>\n\nNow every time a push is made to the main branch, a build process will kick off. This is done through <a href=\"https:\/\/docs.google.com\/document\/u\/0\/d\/1gAWZIUw8TYzoC1QZX4QVo4ziTQ0uPZ2mFEs2GxrzenM\/edit\">Cloud build triggers<\/a>, which allow the user to trigger specific actions on connected repositories. In the image below, a trigger is created to execute a build every time there\u2019s a push to a new branch. Note that specific branches can be defined using regex. The cloudbuild.yaml file is also specified as the configuration for the build.\n\n<img decoding=\"async\" width=\"1341\" height=\"1312\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image5.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image5.png 1341w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image5-300x294.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image5-1024x1002.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image5-768x751.png 768w\" sizes=\"(max-width: 1341px) 100vw, 1341px\">\n\nOnce the connection to the remote repository is established and a trigger is created, the next step is to push a commit to the remote repository. This will execute a build on the environment within the remote git repository. The changes that were pushed to the folder will be used to create a new image, push it to the container registry, run the pipeline file, and move the compiled pipeline specification to storage. All that is needed now is to execute the pipeline using a scheduler or an event.\n\n<img decoding=\"async\" width=\"1822\" height=\"1313\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image4.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image4.png 1822w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image4-300x216.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image4-1024x738.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image4-768x553.png 768w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/05\/image4-1536x1107.png 1536w\" sizes=\"(max-width: 1822px) 100vw, 1822px\">\n<h4>Configure a schedule to execute Pipeline<\/h4>\nThe next step is to actually execute the pipeline file. How this is done can vary depending on the use case.&nbsp;\n<ul>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">We may want to execute a scheduled training pipeline weekly, and deploy our model only when a model artifact has made it up the proper approval channels.<\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\">We may want to retrain a model after a batch process is complete. For example, when data in our data warehouse gets updated, a trigger would occur that trains the model.We can schedule a recurring schedule with the Scheduler API or we can create an event driven workflow using a messaging\/queue service like Pub\/Sub. For our use case, a weekly batch process will suffice.<\/li>\n<\/ul>\n<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>pipeline_job = aiplatform.PipelineJob(\n    template_path=COMPILED_PIPELINE_PATH,\n    pipeline_root=PIPELINE_ROOT_PATH,\n    display_name=\"xgb-pipeline\",\n)\npipeline_job_schedule = pipeline_job.create_schedule(\n    display_name=\"weekly_training_and_deployment\",\n    cron=\"30 18 * * 6\",  # weekly 6pm UTC time on Saturday\n    max_concurrent_run_count=1,\n    max_run_count=1,\n)\n<\/xmp>\n\t\t\t\t<\/code>\n<\/pre>\n<h3>Conclusion<\/h3>\nCongratulations, this article covered a lot of ground. We went from training a machine learning model through a manual process to automating the training, deployment and integrating our workflow with a code repository. This article did not cover an extensive list of all that can be done in a <strong>CICD workflow<\/strong>. For example, a more mature pipeline would handle things like unit tests, deployment to different environments, testing the data for training\/serving skew, and so forth, however, it did demonstrate a starting point to work towards those capabilities.\n\n<a href=\"\/en\/courses\/data-ai\/\">\nStart a training course\n<\/a>\n<h3>References :<\/h3>\n<ul data-stringify-type=\"unordered-list\" data-indent=\"0\" data-border=\"0\">\n \t<li><a href=\"https:\/\/medium.com\/google-cloud\/enterprise-mlops-with-google-cloud-vertex-ai-part-3-ci-cd-33d5e6e774a7\">https:\/\/medium.com\/google-cloud\/enterprise-mlops-with-google-cloud-vertex-ai-part-3-ci-cd-33d5e6e774a7<\/a><\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/architecture\/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build\">Architecture for MLOps using TensorFlow Extended, Vertex AI Pipelines, and Cloud Build<\/a><\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/medium.com\/google-cloud\/how-to-implement-ci-cd-for-your-vertex-ai-pipeline-27963bead8bd\">https:\/\/medium.com\/google-cloud\/how-to-implement-ci-cd-for-your-vertex-ai-pipeline-27963bead8bd<\/a><\/li>\n \t<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/medium.com\/google-cloud\/how-to-implement-ci-cd-for-your-vertex-ai-pipeline-27963bead8bd\">https:\/\/medium.com\/google-cloud\/how-to-implement-ci-cd-for-your-vertex-ai-pipeline-27963bead8bd<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>This is the second part of a series&nbsp; Part 1 here Motivation The previous article discusses the complexities of the machine learning life cycle and how Vertex Pipelines, a managed version of Kubeflow on Google Cloud, can address operationalizing a machine learning model. The next step in the journey is to automate the training and [&hellip;]<\/p>\n","protected":false},"author":74,"featured_media":185054,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-185048","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/185048","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/74"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=185048"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/185048\/revisions"}],"predecessor-version":[{"id":205699,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/185048\/revisions\/205699"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/185054"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=185048"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=185048"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}