Cutad API Docs
Complete reference for the Cutad AI API Gateway. OpenAI-compatible endpoints, streaming support, and production-ready infrastructure.
Authentication
All API requests require a Bearer token in the Authorization header. Obtain your API key from the dashboard.
cutad-. Keep your key secret — never expose it in client-side code or public repositories.
Error Response (Invalid Key)
{
"error": {
"message": "Invalid API key",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
Base URL
https://mimo.lokerin.net
All endpoints are relative to this base URL. The API is OpenAI-compatible — use the OpenAI SDK with a custom base_url.
Chat Completions
/v1/chat/completions
Create a chat completion. Supports both standard and streaming responses.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | required | Model ID: cutad-agent or cutad-agent-pro |
| messages | array | required | Array of message objects with role and content |
| stream | boolean | optional | Enable streaming responses. Default: false |
| temperature | float | optional | Sampling temperature, 0–2. Default: 1 |
| max_tokens | integer | optional | Maximum tokens in the response |
| top_p | float | optional | Nucleus sampling, 0–1. Default: 1 |
| stop | string|array | optional | Stop sequences |
Message Object
| Field | Type | Description |
|---|---|---|
| role | string | system, user, or assistant |
| content | string | The message content |
List Models
/v1/models
Returns a list of available models.
Response
{
"object": "list",
"data": [
{ "id": "cutad-agent", "object": "model", "owned_by": "cutad" },
{ "id": "cutad-agent-pro", "object": "model", "owned_by": "cutad" }
]
}
Streaming
Set "stream": true in your request to receive Server-Sent Events (SSE). Each chunk is a line prefixed with data: .
Stream Chunk Format
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk",
"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk",
"choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk",
"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
data: [DONE]. Always check for this sentinel value when consuming the stream.
Available Models
| Model ID | Description | Rate Limit | Streaming |
|---|---|---|---|
| cutad-agent | General-purpose AI agent. Fast responses, reliable output. | 15 req/min | Yes |
| cutad-agent-pro | Enhanced reasoning, complex tasks, longer context. Recommended for production. | 20 req/min | Yes |
Request Format
Send JSON requests with Content-Type: application/json.
Response Format
Successful responses return JSON in the OpenAI-compatible format.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "cutad-agent",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses qubits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 64,
"total_tokens": 89
}
}
Error Codes
The API uses standard HTTP status codes. Error responses include a JSON body with details.
Error Response Format
{
"error": {
"message": "Human-readable error description",
"type": "error_type",
"code": "error_code"
}
}
Common Error Types
| Type | Code | Description |
|---|---|---|
| authentication_error | invalid_api_key | API key is invalid or expired |
| invalid_request_error | missing_model | No model specified in the request |
| invalid_request_error | invalid_model | Specified model does not exist |
| rate_limit_error | rate_limited | Request queued — waiting for available slot |
| server_error | internal_error | Unexpected server-side failure |
Rate Limits
Rate limits are applied per API key, per model. When the limit is reached, requests are automatically queued and processed once a slot becomes available (max wait: 2 minutes).
503 timeout error.
Rate Limit Headers
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests allowed per window |
| X-RateLimit-Remaining | Requests remaining in current window |
| X-RateLimit-Reset | Unix timestamp when the window resets |
| Retry-After | Seconds until slot available (on 503 timeout) |
Code Examples
Basic Request
Streaming Request
List Models
Using the OpenAI SDK
Streaming with Python
Error Handling
Fetch API
Streaming with JavaScript
Node.js SDK