Skip to main content
Use superserve.serve() to deploy agents as HTTP endpoints via Ray Serve:
import superserve
from superserve import Agent


@superserve.tool(num_cpus=1)
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"


class MyAgent(Agent):
    tools = [search]

    async def run(self, query: str) -> str:
        result = await self.call_tool("search", query=query)
        return f"Found: {result}"


superserve.serve(MyAgent, name="my-agent", num_cpus=1, memory="2GB")
Run with:
superserve up

Serve Options

OptionTypeDefaultDescription
agentAnyrequiredAgent instance to serve
namestrNoneAgent name (inferred from directory if not set)
hoststr”0.0.0.0”Host to bind to
portint8000HTTP port
num_cpusint/float1CPU cores per replica
num_gpusint/float0GPUs per replica
memorystr”2GB”Memory per replica
replicasint1Number of replicas
route_prefixstr/agents/URL prefix for this agent

Endpoints

Each agent gets an endpoint at its route prefix:
POST /{agent_name}/
Request body:
{
  "query": "Hello!"
}
Response:
{
  "response": "Hi there! How can I help you?"
}

Streaming

Enable streaming by setting the stream parameter in your request.
Stream ModeResponse Format
"text"Server-sent events with text chunks
"events"Server-sent events with JSON objects
Text streaming request:
{
  "query": "Hello!",
  "stream": "text"
}
Event streaming request:
{
  "query": "Hello!",
  "stream": "events"
}
Streaming support depends on the agent framework. Most frameworks support both text and event streaming automatically.

CLI: superserve up

The superserve up command discovers and runs all agents and MCP servers in your project:
superserve up
Serving at http://localhost:8000

  Agents:
    POST http://localhost:8000/agents/myagent/

  MCP Servers:
    http://localhost:8000/weather/mcp
It automatically:
  1. Imports all agents/*/agent.py files
  2. Imports all mcp_servers/*/server.py files
  3. Collects superserve.serve() and superserve.serve_mcp() registrations
  4. Starts Ray Serve with everything on one port

CLI Options

superserve up [OPTIONS]
OptionDefaultDescription
--port8000HTTP server port
--host0.0.0.0Host to bind to
--agentsAllComma-separated list of agents to run
--serversAllComma-separated list of MCP servers to run

Scaling with Replicas

Ray Serve automatically load balances requests across replicas:
superserve.serve(
    agent,
    name="high-throughput",
    num_cpus=1,
    memory="2GB",
    replicas=4,
)

GPU Agents

For ML inference workloads, allocate GPUs per replica:
superserve.serve(
    agent,
    name="ml-agent",
    num_gpus=1,
    memory="8GB",
    replicas=2,
)

Resource Planning

When planning cluster resources, consider:
Agent ConfigCluster Requirement
replicas=3, num_cpus=26 CPUs total
replicas=2, num_gpus=12 GPUs total
replicas=4, memory="4GB"16GB RAM total

Monitoring

When running, you can access:
  • Agent endpoints: http://localhost:8000/agents/{agent_name}/
  • Health check: http://localhost:8000/agents/{agent_name}/health
  • Ray Dashboard: http://localhost:8265
The Ray Dashboard provides visibility into:
  • Cluster resources and utilization
  • Task and actor status
  • Logs and metrics