{"id":184236,"date":"2024-04-10T06:30:00","date_gmt":"2024-04-10T05:30:00","guid":{"rendered":"https:\/\/liora.io\/en\/?p=184236"},"modified":"2026-02-06T08:11:14","modified_gmt":"2026-02-06T07:11:14","slug":"introduction-to-kubeflow-for-mlops","status":"publish","type":"post","link":"https:\/\/liora.io\/en\/introduction-to-kubeflow-for-mlops","title":{"rendered":"Introduction to Kubeflow for MLOps<br><i><small>by Tony Ruiz, GCP Customer Engineer at Google<\/small><\/i>"},"content":{"rendered":"<h3>Motivation<\/h3>\t\t\n\t\t<p><strong><a href=\"https:\/\/liora.io\/en\/manifold-learning-what-is-this-machine-learning-method\">Machine Learning Development<\/a> goes beyond developing a model. Productionizing a model quickly becomes a multi-faceted, indeed even a multi-disciplinary process that involves several key stakeholders across both the business and IT. As depicted in the image below, the code to actually develop a model is a small component compared to other complexities such as configuring serving infrastructure, data verification, and monitoring. <\/strong><\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1074\" height=\"452\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image1.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image1.png 1074w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image1-300x126.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image1-1024x431.png 1024w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image1-768x323.png 768w\" sizes=\"(max-width: 1074px) 100vw, 1074px\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t<h3>Purpose<\/h3>\t\t\n\t\t<p>The goal of this article is to serve as a gentle introduction to <strong>Kubeflow pipelines<\/strong> and how it can address the challenges in <strong>ML Learning Lifecycle<\/strong> development. To address all the components in the image above would go beyond the scope of a short blog. For the sake of simplicity, we are going to reference Vertex Pipelines on Google Cloud for our example. Note that <a href=\"https:\/\/www.kubeflow.org\/docs\/started\/installing-kubeflow\/\">Kubeflow<\/a> is an open source project and can be deployed in several different environments.<\/p>\t\t\n\t\t\t<h3>What is Kubeflow ?<\/h3>\t\t\n\t\t<p>Kubeflow is an open-source <a href=\"https:\/\/liora.io\/en\/mlops-devops-applied-to-machine-learning-projects\">MLOps<\/a> (<strong>Machine Learning Operations<\/strong>) platform built on top of <a href=\"https:\/\/liora.io\/en\/why-kubernetes-has-become-an-indispensable-tool-in-data-science\">Kubernetes<\/a>. It simplifies and automates various stages of the machine learning lifecycle, <strong>from data preparation and model training to deployment and monitoring<\/strong>. Think of it as a toolkit that helps you build and manage your ML workflows efficiently and portably across different platforms.<\/p><p>Here are some key aspects of Kubeflow:<\/p><ul><li style=\"font-weight: 400;\" aria-level=\"1\"><strong>Components<\/strong>: It provides a collection of components for different stages of the ML lifecycle, such as pipelines for <a href=\"https:\/\/liora.io\/en\/modernising-data-processing-with-new-tools\">data processing<\/a>, notebooks for experimentation, training jobs for model training, and <strong>KFServing<\/strong> for model deployment.<\/li><li style=\"font-weight: 400;\" aria-level=\"1\"><strong>Kubernetes integration<\/strong>: It leverages Kubernetes for resource <strong>management<\/strong>, <strong>scheduling<\/strong>, and <strong>scaling<\/strong>. This makes Kubeflow workflows portable and scalable across different environments.<\/li><li style=\"font-weight: 400;\" aria-level=\"1\"><strong>Flexibility and customization<\/strong>: You can tailor your <strong>ML workflows<\/strong> by choosing and configuring the specific components you need.<\/li><li style=\"font-weight: 400;\" aria-level=\"1\"><strong>Community-driven<\/strong>: Kubeflow is an active open-source project with a <strong>vibrant community<\/strong> that contributes to its development and supports users.<\/li><\/ul><p>Several different <a href=\"https:\/\/liora.io\/en\/choosing-the-right-cloud-provider-aws-vs-azure-vs-gcp-unveiled\">cloud providers<\/a> (<a href=\"https:\/\/liora.io\/en\/amazon-web-services-aws-unveiling-the-power-of-the-amazon-cloud\">AWS<\/a>, <a href=\"https:\/\/liora.io\/en\/google-cloud-platform-monitoring-features-and-benefits\">Google Cloud<\/a>, <a href=\"https:\/\/liora.io\/en\/microsoft-azure-empower-yourself-with-knowledge\">Microsoft Azure<\/a> ) offer pre-configured Kubeflow distributions that can be installed on a Kubernetes cluster. For this article, <a href=\"https:\/\/cloud.google.com\/vertex-ai\/docs\/pipelines\/introduction\">Vertex Pipelines<\/a> will be used since it is a<b> fully-managed<\/b> Kubeflow service. Fully managed cloud services are cloud services that are handled by a cloud service provider using automations, typically meaning that a developer doesn&#8217;t have to set up and manage machines, patching, or backing up clusters.&nbsp;<\/p>\t\t\n\t\t\t<h3>Pipelines and components<\/h3>\t\t\n\t\t<p>A Kubeflow Pipeline is a platform-agnostic way to define, orchestrate, and manage repeatable, end-to-end machine learning (ML) workflows based on containers.<\/p><p>Think of a pipeline as the workflow for your machine learning job. A <b>pipeline<\/b> is built up from <b>components<\/b>. Components are light weight abstractions of a task in a pipeline. These tasks can be isolated containers or implementations of a function.&nbsp; This is useful because now we can reap several benefits in our machine learning workflows<\/p><ul><li><b>Modularity<\/b>: Components break down complex machine learning workflows into manageable, reusable steps. This improves organization and scalability.<\/li><li><b>Reusability<\/b>: Well-defined components can be used across various pipelines, reducing code duplication and promoting efficiency.<\/li><li><b>Flexibility<\/b>: You can develop Kubeflow components in various ways, giving you flexibility in your technology choices.<\/li><\/ul>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Browse our training courses<\/a><\/div><\/div>\n\n\t\t\t<h3>Hello Kubeflow &#8211; A Rudimentary Example<\/h3>\t\t\n\t\t<p>In this example we are going to create 3 functions&nbsp;<\/p><ul><li>Example_string will take a string and append some text to the string<\/li><li>Example_number will take a number and add 10&nbsp;<\/li><li>Example_combo will take the output of both of those function return a string of the out<\/li><\/ul>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"621\" height=\"458\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image3.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image3.png 621w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image3-300x221.png 300w\" sizes=\"(max-width: 621px) 100vw, 621px\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\tA couple of points here&nbsp;&nbsp;\n<ul>\n \t<li>We are using the <a href=\"https:\/\/www.kubeflow.org\/docs\/components\/pipelines\/v2\/components\/lightweight-python-components\/\">kfp.dsl.component decorator<\/a> to transform our python functions into components<\/li>\n \t<li>Each of these components will be executed in their own container, note that we are defining our base image as well as our packages to install.<\/li>\n<\/ul><p>Next we are going to define our pipeline, which will in turn define how the components are going to interact with each other. In this case, we are going to create a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Directed_acyclic_graph\">DAG<\/a> pipeline which will take the output of the <i>example_string()<\/i> and <i>example_number()<\/i> functions and feed them into a the <i>example_combo()<\/i> function. Note that the pipeline creation is happening inside the <i>example_pipeline()<\/i> function and we are using the <a href=\"https:\/\/www.kubeflow.org\/docs\/components\/pipelines\/v2\/pipelines\/pipeline-basics\/\">kfp.dsl.pipeline<\/a> decorator.<\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"953\" height=\"612\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image2.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image2.png 953w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image2-300x193.png 300w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image2-768x493.png 768w\" sizes=\"(max-width: 953px) 100vw, 953px\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t<p>After defining the pipeline function, the next step is to <b>compile<\/b> the pipeline. The output of this compilation is a YAML template (can also be JSON) that will give the instructions to the Kubeflow service on how to create the pipeline and its components. In this example, we are submitting the template to <strong>Google Cloud\u2019s Vertex Pipelines<\/strong> for execution. <\/p>\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"529\" height=\"228\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image5.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image5.png 529w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image5-300x129.png 300w\" sizes=\"(max-width: 529px) 100vw, 529px\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t<h3>A Minimalistic End-to-End Pipeline<\/h3>\t\t\n\t\tIf you are new to Google Cloud, refer to <a href=\"https:\/\/cloud.google.com\/docs\/get-started\">this page<\/a> to get started. In this example, we are going to be developing our pipeline on a <a href=\"https:\/\/cloud.google.com\/vertex-ai-notebooks?hl=en\">Vertex AI Notebook<\/a>. \t\t\n\t\t\t<h4>Installation<\/h4>\t\t\n\t\t\t<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>! pip3 install --no-cache-dir --upgrade \"kfp>2\" \n                                        google-cloud-aiplatform\nimport google.cloud.aiplatform as aiplatform\nimport kfp\nfrom kfp import compiler, dsl\nfrom kfp.dsl import Artifact, Dataset, Input, Metrics, Model, Output, component\nfrom kfp.dsl import ClassificationMetrics\nfrom typing import NamedTuple\nimport os\nPROJECT_ID = '<YOUR-PROJECT-ID>' # replace with project ID\nREGION = '<YOUR-REGION>'\nEXPERIMENT = 'vertex-pipelines'\nSERIES = 'dev'\n# gcs bucket\nGCS_BUCKET = PROJECT_ID\nBUCKET_URI = f\"gs:\/\/{PROJECT_ID}-bucket\"  # @param {type:\"string\"}\naiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n\t\t<p>Note that Vertex Pipelines require a Google Cloud Storage bucket to store pipeline metadata and artifacts. If you don\u2019t have one created yet, you can create a bucket manually through the console or use this gsutil command line :<\/p>\t\t\n\t\t\t<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Discover our training courses<\/a><\/div><\/div>\n\n\t\t\t<h4>Defining the components<\/h4>\t\t\n\t\t<p>We are going to create 4 different components&nbsp;<\/p><ul><li><strong>get_data()<\/strong> &#8211; this will retrieve a curated breast cancer dataset from kaggle and perform a train\/test split. Note that this example is using the <a href=\"https:\/\/www.kubeflow.org\/docs\/components\/pipelines\/v2\/data-types\/artifacts\/\">Output[Dataset] artifact<\/a> to store the metadata (such as the path of where the dataset is stored).&nbsp;<\/li><li><strong>train_model()<\/strong> &#8211; this will take the dataset artifact created in <i>get_data(), <\/i>train an XGBoost classification model, and finally use the Model Artifact to store the model and it\u2019s metadata in a google cloud storage bucket. Note that custom metadata, such as the training score and framework, is being defined inside of the model artifact.<\/li><li><strong>eval_model()<\/strong> &#8211; this function will take the test dataset created from the get_data() and&nbsp; the model artifact created in the train_mode() as inputs and create an evaluation step. In an MLOps process, we may want to ensure that the machine learning model is meeting a certain performance threshold before deploying the model to an&nbsp; environment&nbsp;<\/li><li><strong>deploy_xgboost_model()<\/strong> &#8211; finally, if our model meets our conditions established in the eval_model(), the model will then be deployed to an Vertex AI Endpoint where it will service requests.<\/li><\/ul><p>Note that this is a simplified and minimalistic example. In practice, instead of deploying the model directly in the same environment, there may be using several different environments (<strong>Dev UAT<\/strong>) that the model artifact has to traverse before being deployed to a production environment.<\/p>\t\t\n\t\t\t<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>@dsl.component(base_image='python:3.8',\npackages_to_install=[\n    \"pandas==1.3.4\",\n    \"scikit-learn==1.0.1\",\n    \"google-cloud-bigquery==3.13.0\",\n    \"db-dtypes==1.1.1\"\n  ],\n)\ndef get_data(\n    project_id: str,\n    dataset_train: Output[Dataset],\n    dataset_test: Output[Dataset]\n) -> None:\n    \"\"\" Loads data from BigQuery, splits it into training and test sets,\n    and saves them as CSV files.\n    Args:\n      project_id: str\n      dataset_train: Output[Dataset] for the training set.\n      dataset_test: Output[Dataset] for the test set.\n    \"\"\"\n     from sklearn import datasets\n     from sklearn.model_selection import train_test_split\n     import pandas as pd\n     from google.cloud import bigquery\n     # Construct a BigQuery client object.\n     client = bigquery.Client(project=project_id)\n     job_config = bigquery.QueryJobConfig()\n     query = \"\"\"\n        SELECT\n      * EXCEPT(fullVisitorId)\n     FROM\n      # features\n      (SELECT\n       fullVisitorId,\n       IFNULL(totals.bounces, 0) AS bounces,\n       IFNULL(totals.timeOnSite, 0) AS time_on_site\n      FROM\n       `data-to-insights.ecommerce.web_analytics`\n      WHERE\n       totals.newVisits = 1\n       AND date BETWEEN '20160801' AND '20170430') # train on first 9 months\n      JOIN\n      (SELECT\n       fullvisitorid,\n       IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit\n      FROM\n        `data-to-insights.ecommerce.web_analytics`\n      GROUP BY fullvisitorid)\n      USING (fullVisitorId)\n      LIMIT 10000\n     ;\n     \"\"\"\n     query_job = client.query(query, job_config=job_config)\n     df = query_job.to_dataframe()\n     # Split Data\n     train, test = train_test_split(df, test_size=0.3, random_state=42)\n     # Save to Outputs\n     train.to_csv(dataset_train.path, index=False)\n     test.to_csv(dataset_test.path, index=False)\n@dsl.component(base_image='python:3.8',\n  packages_to_install=[\n    \"xgboost==1.6.2\",\n    \"pandas==1.3.5\",\n    \"joblib==1.1.0\",\n    \"scikit-learn==1.0.2\",\n  ],\n)\ndef train_model(\n  dataset: Input[Dataset],\n  model_artifact: Output[Model]\n) -> None:\n    \"\"\"Trains an XGBoost classifier on a given dataset and saves the model artifact.\n    Args:\n      dataset: Input[Dataset]\n        The training dataset as a Kubeflow component input.\n      model_artifact: Output[Model]\n        A Kubeflow component output for saving the trained model.\n    Returns:\n      None\n        This function doesn't have a return value; its primary purpose is to produce a model artifact.\n    \"\"\"\n    import os\n    import joblib\n    import pandas as pd\n    from xgboost import XGBClassifier\n    # Load Training Data\n    data = pd.read_csv(dataset.path)\n    # Train XGBoost Model\n    model = XGBClassifier(objective=\"binary:logistic\")\n    model.fit(data.drop(columns=[\"will_buy_on_return_visit\"]), data.will_buy_on_return_visit)\n    # Evaluate and Log Metrics\n    score = model.score(data.drop(columns=[\"will_buy_on_return_visit\"]), data.will_buy_on_return_visit)\n    # Save the Model Artifact\n    os.makedirs(model_artifact.path, exist_ok=True)\n    joblib.dump(model, os.path.join(model_artifact.path, \"model.joblib\"))\n    # Metadata for the Artifact\n    model_artifact.metadata[\"train_score\"] = float(score)\n    model_artifact.metadata[\"framework\"] = \"XGBoost\"\n@dsl.component(base_image='python:3.8',\n packages_to_install=[\n   \"xgboost==1.6.2\",\n   \"pandas==1.3.5\",\n   \"joblib==1.1.0\",\n   \"scikit-learn==1.0.2\",\n   \"google-cloud-storage==2.13.0\",\n ],\n)\ndef eval_model(\n  test_set: Input[Dataset],\n  xgb_model: Input[Model],\n  metrics: Output[ClassificationMetrics],\n  smetrics: Output[Metrics],\n  bucket_name: str,\n  score_threshold: float = 0.8\n) -> NamedTuple(\"Outputs\", [(\"deploy\", str)]):\n   \"\"\"Evaluates an XGBoost model on a test dataset, logs metrics, and decides whether to deploy.\n   Args:\n     test_set: Input[Dataset]\n        The test dataset as a Kubeflow component input.\n     xgb_model: Input[Model]\n        The trained XGBoost model as a Kubeflow component input.\n     metrics: Output[ClassificationMetrics]\n        A Kubeflow component output for logging classification metrics.\n     smetrics: Output[Metrics]\n        A Kubeflow component output for logging scalar metrics.\n     bucket_name: str\n        The name of the Google Cloud Storage bucket containing the model.\n     score_threshold: float, default=0.8\n        The minimum score required for deployment.\n    Returns:\n      NamedTuple(\"Outputs\", [(\"deploy\", str)])\n        A named tuple with a single field:\n        * deploy: str\n           A string indicating whether to deploy the model (\"true\" or \"false\").\n    \"\"\"\n    from google.cloud import storage\n    import joblib\n    import pandas as pd\n    from sklearn.metrics import roc_curve, confusion_matrix\n    from collections import namedtuple\n    # Load Test Data and Model\n    data = pd.read_csv(test_set.path)\n    client = storage.Client()\n    bucket = client.get_bucket(bucket_name)\n    blob_path = xgb_model.uri.replace(f\"gs:\/\/{bucket_name}\/\", \"\")\n    smetrics.log_metric(\"blob_path\", str(blob_path))\n    blob = bucket.blob(f\"{blob_path}\/model.joblib\")\n    with blob.open(mode=\"rb\") as file:\n      model = joblib.load(file)\n    # Evaluation and Metrics\n    y_scores = model.predict_proba(data.drop(columns=[\"will_buy_on_return_visit\"]))[:, 1]\n    y_pred = model.predict(data.drop(columns=[\"will_buy_on_return_visit\"]))\n    score = model.score(data.drop(columns=[\"will_buy_on_return_visit\"]), data.will_buy_on_return_visit)\n    fpr, tpr, thresholds = roc_curve(data.will_buy_on_return_visit.to_numpy(), y_scores, pos_label=True)\n    metrics.log_roc_curve(fpr.tolist(), tpr.tolist(), thresholds.tolist())\n    cm = confusion_matrix(data.will_buy_on_return_visit, y_pred)\n    metrics.log_confusion_matrix([\"False\", \"True\"], cm.tolist())\n    smetrics.log_metric(\"score\", float(score))\n    # ----- 3. Deployment Decision Logic -----\n    deploy = \"true\" if score >= score_threshold else \"false\"\n    # ----- 4. Metadata Update -----\n    xgb_model.metadata[\"test_score\"] = float(score)\n    Outputs = namedtuple(\"Outputs\", [\"deploy\"])\n    return Outputs(deploy)\n@dsl.component(base_image='python:3.8',\n packages_to_install=[\"google-cloud-aiplatform==1.25.0\"],\n)\ndef deploy_xgboost_model(\n   model: Input[Model],\n   project_id: str,\n   vertex_endpoint: Output[Artifact],\n   vertex_model: Output[Model]\n) -> None:\n   \"\"\"Deploys an XGBoost model to Vertex AI Endpoint.\n   Args:\n     model: The model to deploy.\n     project_id: The Google Cloud project ID.\n     vertex_endpoint: Output[Artifact] representing the deployed Vertex AI Endpoint.\n     vertex_model: Output[Model] representing the deployed Vertex AI Model.\n  \"\"\"\n  from google.cloud import aiplatform\n  # Initialize AI Platform with project\n  aiplatform.init(project=project_id)\n  # Upload the Model\n  deployed_model = aiplatform.Model.upload(\n    display_name=\"xgb-classification\",\n    artifact_uri=model.uri,\n    serving_container_image_uri=\"us-docker.pkg.dev\/vertex-ai\/prediction\/xgboost-cpu.1-6:latest\",\n  )\n   # Deploy the Model to an Endpoint\n   endpoint = deployed_model.deploy(machine_type=\"n1-standard-4\")\n   # Save Outputs\n   vertex_endpoint.uri = endpoint.resource_name\n   vertex_model.uri = deployed_model.resource_name<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n\t\t\t<h4>Defining the pipeline<\/h4>\t\t\n\t\t<p>Next step is to define the pipeline function. Similar to the initial example, the pipeline components are stitched together to create a Directed Acyclic graph of steps. The <a href=\"https:\/\/www.kubeflow.org\/docs\/components\/pipelines\/v2\/pipelines\/control-flow\/\">dsl.Condition<\/a> is introduced as a control flow step for the evaluation function. In this case, if the evaluation criteria is met (returns true), the pipeline will proceed with the deployment.<\/p>\t\t\n\t\t\t<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>PIPELINE_ROOT = BUCKET_NAME + \"\/pipeline_root\/\" # path to pipeline metadata @dsl.pipeline(\n    # Default pipeline root. You can override it when submitting the pipeline.\n    pipeline_root=PIPELINE_ROOT + \"xgboost-pipeline-v2\",\n    # A name for the pipeline. Use to determine the pipeline Context.\n    name=\"xgboost-pipeline-with-deployment-v2\",\n)\ndef pipeline():\n    dataset_op = get_data()\n    training_op = train_model(dataset = dataset_op.outputs[\"dataset_train\"])\n    eval_op = eval_model(\n        test_set=dataset_op.outputs[\"dataset_test\"],\n        xgb_model=training_op.outputs[\"model_artifact\"],\n        bucket_name = \"kubeflow-mlops-410520-bucket\"\n    )\n    with dsl.Condition(\n        eval_op.outputs[\"deploy\"] == \"true\",\n        name=\"deploy\",\n    ):\n      deploy_op = deploy_xgboost_model(model = training_op.outputs[\"model_artifact\"],\n                         project_id = PROJECT_ID,\n                        )<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n\t\t\t<h4>Compiling and executing the pipeline<\/h4>\t\t\n\t\t<p>Finally, the pipeline is compiled to create the YAML template. That template is submitted to the Vertex AI platform service. The model is then trained and deployed to the endpoint :<\/p>\t\t\n\t\t\t<pre data-line=\"\">\t\t\t\t<code readonly=\"true\">\n\t\t\t\t\t<xmp>compiler.Compiler().compile(pipeline_func=pipeline, package_path=\"pipeline-breast-cancer.yaml\")\nfrom google.cloud.aiplatform import pipeline_jobs\njob = aiplatform.PipelineJob(\n    display_name=\"breast-cancer-demo-pipeline\",\n    template_path=\"pipeline-breast-cancer.yaml\",\n    pipeline_root=PIPELINE_ROOT,\n)\njob.run()<\/xmp>\n\t\t\t\t<\/code>\n\t\t\t<\/pre>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"889\" height=\"978\" src=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image4.png\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image4.png 889w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image4-273x300.png 273w, https:\/\/liora.io\/app\/uploads\/sites\/9\/2024\/04\/image4-768x845.png 768w\" sizes=\"(max-width: 889px) 100vw, 889px\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t<h3>Conclusion<\/h3>\t\t\n\t\t<p>This article covers a lot of ground. To summarize we covered some basic components Kubeflow pipelines and their part to play in creating a modularize, reusable, and flexible machine learning workflows. We also created a minimalistic end-to-end pipeline on Googe Cloud. Congratulations !<\/p>\t\t\n\t\t\t\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex is-content-justification-center\"><div class=\"wp-block-button \"><a class=\"wp-block-button__link wp-element-button \" href=\"\/en\/courses\/data-ai\/\">Start a training course<\/a><\/div><\/div>\n\n\t\t\t<h3>References :<\/h3>\t\t\n\t\t<ul><li><a href=\"https:\/\/cloud.google.com\/vertex-ai\/docs\/pipelines\/introduction\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/cloud.google.com\/vertex-ai\/docs\/pipelines\/introduction\" data-sk=\"tooltip_parent\">https:\/\/cloud.google.com\/vertex-ai\/docs\/pipelines\/introduction<\/a><\/li><\/ul><ul data-stringify-type=\"unordered-list\" data-indent=\"0\" data-border=\"0\"><li data-stringify-indent=\"0\" data-stringify-border=\"0\"><a href=\"https:\/\/github.com\/GoogleCloudPlatform\/vertex-ai-samples\/blob\/main\/notebooks\/official\/pipelines\/kfp2_pipeline.ipynb\">https:\/\/github.com\/GoogleCloudPlatform\/vertex-ai-samples\/blob\/main\/notebooks\/official\/pipelines\/kfp2_pipeline.ipynb<\/a><\/li><li data-stringify-indent=\"0\" data-stringify-border=\"0\">Code was referenced Sascha Heyer from&nbsp;<a href=\"https:\/\/medium.com\/google-cloud\/google-vertex-ai-the-easiest-way-to-run-ml-pipelines-3a41c5ed153\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/medium.com\/google-cloud\/google-vertex-ai-the-easiest-way-to-run-ml-pipelines-3a41c5ed153\" data-sk=\"tooltip_parent\">https:\/\/medium.com\/google-cloud\/google-vertex-ai-the-easiest-way-to-run-ml-pipelines-3a41c5ed153<\/a><\/li><\/ul>","protected":false},"excerpt":{"rendered":"<p>Motivation Machine Learning Development goes beyond developing a model. Productionizing a model quickly becomes a multi-faceted, indeed even a multi-disciplinary process that involves several key stakeholders across both the business and IT. As depicted in the image below, the code to actually develop a model is a small component compared to other complexities such as [&hellip;]<\/p>\n","protected":false},"author":47,"featured_media":184238,"comment_status":"open","ping_status":"open","sticky":false,"template":"elementor_theme","format":"standard","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"categories":[2433],"class_list":["post-184236","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/184236","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/users\/47"}],"replies":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/comments?post=184236"}],"version-history":[{"count":1,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/184236\/revisions"}],"predecessor-version":[{"id":205844,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/posts\/184236\/revisions\/205844"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media\/184238"}],"wp:attachment":[{"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/media?parent=184236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/liora.io\/en\/wp-json\/wp\/v2\/categories?post=184236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}