# API Reference Documentation

## Overview

Our AI API implements OpenAI-compatible interfaces that support:

* Model listing and querying
* Chat completion (streaming and non-streaming)
* Text completion (streaming and non-streaming)
* Administrator model management

## Basic Information

* **Base URL**: `https://apis.gradient.network/api/v1`
* **Authentication**: Access Key
* **Content Type**: `application/json`
* **API Version**: v1

## Authentication

### Access Key Authentication

```bash
Authorization: Bearer your-access-key-here
```

## API Endpoints

### 1. Model Management APIs

#### **1.1 List All Models**

**Endpoint**: `GET /ai/models`

**Description**: List of all available AI models (no authentication required)

**Request Parameters**: None

**Response Example**:

```json
{
  "object": "list",
  "data": [
    {
      "id": "qwen/qwen3-coder-480b-instruct-fp8",
      "object": "model",
      "created": 1640995200,
      "owned_by": "qwen",
      "permission": [],
      "root": "qwen/qwen3-coder-480b-instruct-fp8",
      "parent": null
    }
  ]
}
```

**Error Codes**:

* `200`: Success
* `500`: Internal Server Error

### 2. Chat Completion API

#### **2.1 Chat Completion**

**Endpoint**: `POST /ai/chat/completions`

**Description**: Create a chat completion request, supporting both streaming and non-streaming responses

**Authentication**: Access Key

**Request Parameters**:

| Parameter           | Type         | Required | Description                                      |
| ------------------- | ------------ | -------- | ------------------------------------------------ |
| `model`             | string       | Yes      | The ID of the model to use                       |
| `messages`          | array        | Yes      | Array of conversation messages                   |
| `stream`            | boolean      | No       | Whether to use streaming response, default false |
| `max_tokens`        | integer      | No       | Maximum number of tokens to generate             |
| `temperature`       | number       | No       | Sampling temperature, 0-2, default 1             |
| `top_p`             | number       | No       | Nucleus sampling parameter, 0-1, default 1       |
| `n`                 | integer      | No       | Number of responses to generate, default 1       |
| `stop`              | string/array | No       | Stop generation tokens                           |
| `presence_penalty`  | number       | No       | Presence penalty, -2.0 to 2.0, default 0         |
| `frequency_penalty` | number       | No       | Frequency penalty, -2.0 to 2.0, default 0        |
| `logit_bias`        | object       | No       | Modify sampling probability for specified tokens |
| `user`              | string       | No       | User identifier                                  |

**Request Example**:

```json
{
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 100
}
```

**Non-streaming Response Example**:

```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}
```

**Streaming Response Example**:

```
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]
```

**Error Codes**:

* `200`: Success
* `400`: Bad Request
* `401`: Unauthorized
* `402`: Billing Check Failed
* `404`: Model Not Found
* `429`: Rate Limit Exceeded
* `500`: Internal Server Error

### 3. Text Completion API

#### **3.1 Text Completion**

**Endpoint**: `POST /ai/completions`

**Description**: Create a text completion request, supporting both streaming and non-streaming responses

**Authentication**: Access Key

**Request Parameters**:

| Parameter           | Type         | Required | Description                                      |
| ------------------- | ------------ | -------- | ------------------------------------------------ |
| `model`             | string       | Yes      | The ID of the model to use                       |
| `prompt`            | string/array | Yes      | Prompt text                                      |
| `suffix`            | string       | No       | Suffix to append after inserted text             |
| `max_tokens`        | integer      | No       | Maximum number of tokens to generate             |
| `temperature`       | number       | No       | Sampling temperature, 0-2, default 1             |
| `top_p`             | number       | No       | Nucleus sampling parameter, 0-1, default 1       |
| `n`                 | integer      | No       | Number of responses to generate, default 1       |
| `stream`            | boolean      | No       | Whether to use streaming response, default false |
| `logprobs`          | integer      | No       | Return log probabilities for most likely tokens  |
| `echo`              | boolean      | No       | Whether to echo the prompt, default false        |
| `stop`              | string/array | No       | Stop generation tokens                           |
| `presence_penalty`  | number       | No       | Presence penalty, -2.0 to 2.0, default 0         |
| `frequency_penalty` | number       | No       | Frequency penalty, -2.0 to 2.0, default 0        |
| `best_of`           | integer      | No       | Select from best candidates, default 1           |
| `logit_bias`        | object       | No       | Modify sampling probability for specified tokens |
| `user`              | string       | No       | User identifier                                  |

**Request Example**:

```json
{
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "prompt": "Complete this sentence: The quick brown fox",
  "max_tokens": 20,
  "temperature": 0.5
}
```

**Response Example**:

```json
{
  "id": "cmpl-123",
  "object": "text_completion",
  "created": 1677652288,
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "choices": [
    {
      "text": " jumps over the lazy dog",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}
```

**Error Codes**:

* `200`: Success
* `400`: Bad Request
* `401`: Unauthorized
* `402`: Billing Check Failed
* `404`: Model Not Found
* `429`: Rate Limit Exceeded
* `500`: Internal Server Error

## Error Code Details

### Common Error Codes

| Error Code | HTTP Status | Description           | Solution                                           |
| ---------- | ----------- | --------------------- | -------------------------------------------------- |
| `400`      | 400         | Bad Request           | Check request parameter format and required fields |
| `401`      | 401         | Unauthorized          | Provide valid Access Key or JWT Token              |
| `402`      | 402         | Billing Check Failed  | Check account balance and billing status           |
| `403`      | 403         | Forbidden             | Confirm user permissions and roles                 |
| `404`      | 404         | Resource Not Found    | Check if resource ID is correct                    |
| `429`      | 429         | Rate Limit Exceeded   | Reduce request frequency or contact administrator  |
| `500`      | 500         | Internal Server Error | Contact technical support                          |

### AI-Specific Error Codes

| Error Code                | Description                                    | Solution                                                    |
| ------------------------- | ---------------------------------------------- | ----------------------------------------------------------- |
| `model_not_found`         | Specified model does not exist                 | Check model ID or use `/ai/models` to get available models  |
| `model_not_supported`     | Model does not support requested functionality | Check model capabilities or use other models                |
| `context_length_exceeded` | Input exceeds model context length limit       | Reduce input length or use models supporting longer context |
| `invalid_parameters`      | Parameter values are invalid                   | Check parameter ranges and formats                          |
| `billing_check_failed`    | Billing check failed                           | Check account balance and billing configuration             |

## Usage Examples

### Python Examples

```python
import requests
import json

# Configuration
API_BASE = "https://apis.gradient.network/api/v1"
API_KEY = "your-access-key-here"

# Chat completion request
def chat_completion(prompt, model="qwen/qwen3-coder-480b-instruct-fp8"):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7,
        "max_tokens": 100
    }
    
    response = requests.post(
        f"{API_BASE}/ai/chat/completions",
        headers=headers,
        json=data
    )
    
    return response.json()

# Usage example
result = chat_completion("Hello, how are you?")
print(result)
```

### JavaScript Examples

```javascript
// Configuration
const API_BASE = "https://apis.gradient.network/api/v1";
const API_KEY = "your-access-key-here";

// Chat completion request
async function chatCompletion(prompt, model = "qwen/qwen3-coder-480b-instruct-fp8") {
    const response = await fetch(`${API_BASE}/ai/chat/completions`, {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${API_KEY}`,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            model: model,
            messages: [{ role: 'user', content: prompt }],
            temperature: 0.7,
            max_tokens: 100
        })
    });
    
    return await response.json();
}

// Usage example
chatCompletion("Hello, how are you?")
    .then(result => console.log(result))
    .catch(error => console.error(error));
```

### cURL Examples

```bash
# Chat completion
curl -X POST "https://apis.gradient.network/api/v1/ai/chat/completions" \
  -H "Authorization: Bearer your-access-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-coder-480b-instruct-fp8",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "temperature": 0.7,
    "max_tokens": 100
  }'

# Get model list
curl "https://apis.gradient.network/api/v1/ai/models"
```

## Rate Limits and Quotas

### Rate Limits

* **Free Users**: 60 requests per minute
* **Paid Users**: Based on plan, typically 1000-10000 requests per minute

### Token Limits

* **Input Tokens**: Based on model context length limits
* **Output Tokens**: Based on model capabilities and billing limits

### Concurrency Limits

* **Free Users**: Maximum 3 concurrent requests
* **Paid Users**: Based on plan, typically 10-100 concurrent requests

## Best Practices

### 1. Error Handling

```python
try:
    response = chat_completion(prompt)
    if response.get('error'):
        print(f"Error: {response['error']['message']}")
    else:
        print(response['choices'][0]['message']['content'])
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")
```

### 2. Streaming Processing

```python
def stream_chat_completion(prompt, model):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "stream": True
    }
    
    response = requests.post(
        f"{API_BASE}/ai/chat/completions",
        headers=headers,
        json=data,
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data = line[6:]  # Remove 'data: ' prefix
                if data == '[DONE]':
                    break
                try:
                    json_data = json.loads(data)
                    content = json_data['choices'][0]['delta'].get('content', '')
                    if content:
                        print(content, end='', flush=True)
                except json.JSONDecodeError:
                    continue
```

### 3. Retry Mechanism

```python
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def create_session_with_retry():
    session = requests.Session()
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504]
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

# Use retry session
session = create_session_with_retry()
response = session.post(url, headers=headers, json=data)
```

## Support

If you encounter issues during usage, please:

1. Review the error code descriptions in this document
2. Check request parameters and authentication information
3. Contact the technical support team
4. Check the system status page


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gradient.network/enterprise-solutions/gradient-cloud/api-reference-documentation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
