BentoML: Helping Deploy ML Models

Last Updated : 20 Feb, 2026

BentoML model deployment is the process of converting a trained machine learning model into a fully functional API service. It allows you to package the model along with any pre processing or post processing logic into a deployable unit called a Bento. It supports many popular ML frameworks like scikit-learn, TensorFlow, PyTorch and XGBoost. Once the service is created you can serve it locally for testing or containerize it using Docker for deployment to production environments like Kubernetes or cloud platforms.

Machine-learning-deployment-
Deploying ML Models

Key Features

  • Model Packaging: BentoML allows you to package machine learning models into a standardized format called a Bento which includes the trained model, any custom pre processing or post processing logic and all required Python dependencies.
  • Multi Framework Support: It supports a wide variety of machine learning and deep learning libraries such as scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM, Hugging Face Transformers and even custom Python models.
  • API Serving: BentoML can automatically generate RESTful and gRPC APIs to serve your models. It uses FastAPI internally, providing high performance asynchronous serving capabilities that are production ready with minimal configuration.
  • Runners for Scalable Inference: To handle scalable and efficient model inference, BentoML introduces a concept called runners which isolate the model's prediction logic from the API interface.

Deploying ML Models using BentoML

Step 1: Install BentoML

This command installs the BentoML library which provides all the tools needed to package, serve and deploy machine learning models.

Python
pip install bentoml scikit-learn pandas numpy

Step 2: Train and Save Your Model

Here we are training a machine learning model using scikit-learn and then save it with BentoML’s save_model function to create a reusable model artifact, create train.py and paste the below code.

Python
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
import bentoml

# Fake training dataset
data = pd.DataFrame({
    "monthly_charges": [50, 80, 20, 90, 60, 100, 30, 70],
    "tenure": [12, 2, 24, 1, 10, 3, 30, 5],
    "support_tickets": [1, 5, 0, 7, 2, 6, 0, 4],
    "churn": [0, 1, 0, 1, 0, 1, 0, 1]
})

X = data[["monthly_charges", "tenure", "support_tickets"]]
y = data["churn"]

model = LogisticRegression()
model.fit(X, y)

# Save model into BentoML model store
bentoml.sklearn.save_model("churn_model", model)

print("Model trained and saved successfully!")

Step4: Run

python train.py

This saves the model inside BentoML’s model store.


Step 5: Create a BentoML Service

This code defines a BentoML service that loads the saved model and exposes a predict API to receive input data and return predictions.

Python
import bentoml
import numpy as np

@bentoml.service(resources={"cpu": 1})
class ChurnPredictor:

    def __init__(self):
        self.model = bentoml.sklearn.load_model("churn_model:latest")

    @bentoml.api()
    def predict(self, monthly_charges: float, tenure: int, support_tickets: int) -> dict:
        features = np.array([[monthly_charges, tenure, support_tickets]])
        prediction = self.model.predict(features)[0]
        probability = self.model.predict_proba(features)[0][1]

        return {
            "churn_prediction": int(prediction),
            "churn_probability": float(probability)
        }

Step 6: Serve Locally for Testing

This command starts a local web server that hosts your model’s API allowing you to test prediction requests in real time.

Python
bentoml serve service:ChurnPredictor --reload

Step 7: Test the API

curl -X POST "http://localhost:3000/predict" \
-H "Content-Type: application/json" \
-d "{\"monthly_charges\":95, \"tenure\":2, \"support_tickets\":6}"

Example Response:

{

"churn_prediction": 1,

"churn_probability": 0.87

}


Step 8: Build the Bento for Deployment

This command packages your service code and model into a versioned Bento bundle ready for deployment.

Python
bentoml build

Step 6: Deploy to Cloud or Server

Now you can deploy the Docker image to platforms like:

  • AWS (ECS, Lambda, SageMaker)
  • Google Cloud Run
  • Azure Container Apps
  • Kubernetes

Advantages

  1. Streamlined Model Deployment: BentoML simplifies the process of turning ML models into deployable services by handling packaging, API creation and infrastructure integration all in one tool.
  2. Multi Framework Compatibility: It supports a wide range of machine learning frameworks like scikit-learn, TensorFlow, PyTorch, XGBoost and even custom models, making it flexible for diverse workflows.
  3. Automatic API Generation: You can expose models as REST or gRPC APIs with minimal code using FastAPI enabling quick and efficient integration with applications.
  4. Containerization Support: BentoML automatically generates Docker containers for your services making it easy to deploy models in cloud or on premise environments.

Disadvantages

  1. Steeper Learning Curve for Beginners: Users new to concepts like containerization, API serving or model ops might find BentoML’s setup and structure slightly complex initially.
  2. Overhead for Simple Use Cases: For basic models or quick tests the full Bento packaging and Docker build steps may feel like unnecessary overhead.
  3. Infrastructure Knowledge Required: Advanced use cases like GPU runner setup or cloud deployment may require DevOps skills and knowledge of Docker, Kubernetes or cloud services.
  4. Limited GUI Without Yatai: Without Yatai users rely primarily on the CLI or code which may be limiting for those who prefer visual interfaces for managing models and deployments.
Comment