Quick Start¶
In this tutorial, you will deploy your first model to a Kubernetes cluster.
You will learn:
- How to get the project code.
- How to review and apply the Helm configuration.
- How to verify the deployment status.
- How to send requests to your running model.
Prerequisites¶
Before you begin, ensure you have the following:
- Access to a Kubernetes cluster with the KubeRay operator installed. If you do not have KubeRay installed, follow the Installation Guide.
- The
kubectlcommand-line tool configured to communicate with your cluster. - Basic familiarity with Kubernetes concepts such as pods and namespaces.
Step 1: Get the code¶
Open your terminal and clone the repository:
Step 2: Add your model code¶
If you want to deploy your own model, start by adding the model code and Ray Serve application entry point. Follow the Adding New Models guide for a step-by-step walkthrough of this process.
Step 3: Review the Configuration¶
In Model Service, configurations are managed using Helm. The environment configuration is split into values and applications inside the helm/rayservice/ directory.
Applications are configured by adding simple YAML definitions into the helm/rayservice/applications/ folder.
Let's look at a sample application definition (e.g. helm/rayservice/applications/prostate-classifier-1.yaml):
- name: prostate-classifier-1
import_path: models.binary_classifier:app
route_prefix: /prostate-classifier-1
runtime_env:
working_dir: https://github.com/RationAI/model-service/archive/refs/heads/main.zip
# ...
Let's break down the application config:
name: Logical app name (visible in the Ray dashboard and logs).import_path: Python entrypoint (module.path:variable).route_prefix: HTTP path under the Serve gateway.runtime_env.working_dir: Git ZIP URL Ray downloads before startup. It must point to committed and pushed code.
For your first deployment, we will use the existing configuration without changes.
Step 4: Deploy the service¶
To deploy the service, run Helm:
Use a dedicated test release name (for example rayservice-model-my-model) so test actions do not affect the main deployment.
This command automates the deployment process by compiling your Helm templates and applying the final manifest to the Kubernetes cluster.
If you changed or added an application definition that points runtime_env.working_dir to your branch, commit and push those changes before running Helm so Ray can fetch the updated code snapshot.
Tip for avoiding cache issues: Ray caches the downloaded working_dir based on the URL string. If you push new code to the same branch/zip URL, Ray will use the old cached version. A great way to bypass this and force a refresh is to add a query parameter to the working_dir URL in your config, like ?v=1, ?v=2, etc. You can just do this locally before deploying; it doesn't even need to be pushed to the remote repo.
Step 5: Monitor the deployment¶
Deploying models takes time as the cluster downloads images and starts worker pods.
Check the overall status of your RayService:
Check the status of the individual pods:
If the pods fail to start, you can inspect the details for troubleshooting:
Note: Using the Ray Dashboard¶
Ray provides a dashboard for visual monitoring. To access it, forward the port to your local machine:
Open a web browser and navigate to http://127.0.0.1:8265. Your models are ready when their Serve applications display a RUNNING status.
Step 6: Send a request¶
To communicate with the model from your local machine, forward the Serve port:
You can now send HTTP requests to http://localhost:8000/prostate-classifier-1/.
Step 7: Clean up¶
When you are finished, delete the deployment to free up cluster resources:
This removes only your selected <release-name> test deployment.
Related Guides¶
- To deploy your own custom Python model, see Adding New Models.
- To configure scaling or memory settings, read the Configuration Reference.
- For a comprehensive walkthrough of a production deployment, see the Deployment Guide.
- To understand the internal architecture, explore the Architecture Overview.