API Reference Documentation

This document describes the OpenAI-compatible AI service interfaces, including model management, chat completion, and text completion functionality.

Overview

Our AI API implements OpenAI-compatible interfaces that support:

  • Model listing and querying

  • Chat completion (streaming and non-streaming)

  • Text completion (streaming and non-streaming)

  • Administrator model management

Basic Information

  • Base URL: https://apis.gradient.network/api/v1

  • Authentication: Access Key

  • Content Type: application/json

  • API Version: v1

Authentication

Access Key Authentication

Authorization: Bearer your-access-key-here

API Endpoints

1. Model Management APIs

1.1 List All Models

Endpoint: GET /ai/models

Description: List of all available AI models (no authentication required)

Request Parameters: None

Response Example:

{
  "object": "list",
  "data": [
    {
      "id": "qwen/qwen3-coder-480b-instruct-fp8",
      "object": "model",
      "created": 1640995200,
      "owned_by": "qwen",
      "permission": [],
      "root": "qwen/qwen3-coder-480b-instruct-fp8",
      "parent": null
    }
  ]
}

Error Codes:

  • 200: Success

  • 500: Internal Server Error

2. Chat Completion API

2.1 Chat Completion

Endpoint: POST /ai/chat/completions

Description: Create a chat completion request, supporting both streaming and non-streaming responses

Authentication: Access Key

Request Parameters:

Parameter
Type
Required
Description

model

string

Yes

The ID of the model to use

messages

array

Yes

Array of conversation messages

stream

boolean

No

Whether to use streaming response, default false

max_tokens

integer

No

Maximum number of tokens to generate

temperature

number

No

Sampling temperature, 0-2, default 1

top_p

number

No

Nucleus sampling parameter, 0-1, default 1

n

integer

No

Number of responses to generate, default 1

stop

string/array

No

Stop generation tokens

presence_penalty

number

No

Presence penalty, -2.0 to 2.0, default 0

frequency_penalty

number

No

Frequency penalty, -2.0 to 2.0, default 0

logit_bias

object

No

Modify sampling probability for specified tokens

user

string

No

User identifier

Request Example:

{
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 100
}

Non-streaming Response Example:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Streaming Response Example:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

Error Codes:

  • 200: Success

  • 400: Bad Request

  • 401: Unauthorized

  • 402: Billing Check Failed

  • 404: Model Not Found

  • 429: Rate Limit Exceeded

  • 500: Internal Server Error

3. Text Completion API

3.1 Text Completion

Endpoint: POST /ai/completions

Description: Create a text completion request, supporting both streaming and non-streaming responses

Authentication: Access Key

Request Parameters:

Parameter
Type
Required
Description

model

string

Yes

The ID of the model to use

prompt

string/array

Yes

Prompt text

suffix

string

No

Suffix to append after inserted text

max_tokens

integer

No

Maximum number of tokens to generate

temperature

number

No

Sampling temperature, 0-2, default 1

top_p

number

No

Nucleus sampling parameter, 0-1, default 1

n

integer

No

Number of responses to generate, default 1

stream

boolean

No

Whether to use streaming response, default false

logprobs

integer

No

Return log probabilities for most likely tokens

echo

boolean

No

Whether to echo the prompt, default false

stop

string/array

No

Stop generation tokens

presence_penalty

number

No

Presence penalty, -2.0 to 2.0, default 0

frequency_penalty

number

No

Frequency penalty, -2.0 to 2.0, default 0

best_of

integer

No

Select from best candidates, default 1

logit_bias

object

No

Modify sampling probability for specified tokens

user

string

No

User identifier

Request Example:

{
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "prompt": "Complete this sentence: The quick brown fox",
  "max_tokens": 20,
  "temperature": 0.5
}

Response Example:

{
  "id": "cmpl-123",
  "object": "text_completion",
  "created": 1677652288,
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "choices": [
    {
      "text": " jumps over the lazy dog",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}

Error Codes:

  • 200: Success

  • 400: Bad Request

  • 401: Unauthorized

  • 402: Billing Check Failed

  • 404: Model Not Found

  • 429: Rate Limit Exceeded

  • 500: Internal Server Error

Error Code Details

Common Error Codes

Error Code
HTTP Status
Description
Solution

400

400

Bad Request

Check request parameter format and required fields

401

401

Unauthorized

Provide valid Access Key or JWT Token

402

402

Billing Check Failed

Check account balance and billing status

403

403

Forbidden

Confirm user permissions and roles

404

404

Resource Not Found

Check if resource ID is correct

429

429

Rate Limit Exceeded

Reduce request frequency or contact administrator

500

500

Internal Server Error

Contact technical support

AI-Specific Error Codes

Error Code
Description
Solution

model_not_found

Specified model does not exist

Check model ID or use /ai/models to get available models

model_not_supported

Model does not support requested functionality

Check model capabilities or use other models

context_length_exceeded

Input exceeds model context length limit

Reduce input length or use models supporting longer context

invalid_parameters

Parameter values are invalid

Check parameter ranges and formats

billing_check_failed

Billing check failed

Check account balance and billing configuration

Usage Examples

Python Examples

import requests
import json

# Configuration
API_BASE = "https://apis.gradient.network/api/v1"
API_KEY = "your-access-key-here"

# Chat completion request
def chat_completion(prompt, model="qwen/qwen3-coder-480b-instruct-fp8"):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7,
        "max_tokens": 100
    }
    
    response = requests.post(
        f"{API_BASE}/ai/chat/completions",
        headers=headers,
        json=data
    )
    
    return response.json()

# Usage example
result = chat_completion("Hello, how are you?")
print(result)

JavaScript Examples

// Configuration
const API_BASE = "https://apis.gradient.network/api/v1";
const API_KEY = "your-access-key-here";

// Chat completion request
async function chatCompletion(prompt, model = "qwen/qwen3-coder-480b-instruct-fp8") {
    const response = await fetch(`${API_BASE}/ai/chat/completions`, {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${API_KEY}`,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            model: model,
            messages: [{ role: 'user', content: prompt }],
            temperature: 0.7,
            max_tokens: 100
        })
    });
    
    return await response.json();
}

// Usage example
chatCompletion("Hello, how are you?")
    .then(result => console.log(result))
    .catch(error => console.error(error));

cURL Examples

# Chat completion
curl -X POST "https://apis.gradient.network/api/v1/ai/chat/completions" \
  -H "Authorization: Bearer your-access-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-coder-480b-instruct-fp8",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "temperature": 0.7,
    "max_tokens": 100
  }'

# Get model list
curl "https://apis.gradient.network/api/v1/ai/models"

Rate Limits and Quotas

Rate Limits

  • Free Users: 60 requests per minute

  • Paid Users: Based on plan, typically 1000-10000 requests per minute

Token Limits

  • Input Tokens: Based on model context length limits

  • Output Tokens: Based on model capabilities and billing limits

Concurrency Limits

  • Free Users: Maximum 3 concurrent requests

  • Paid Users: Based on plan, typically 10-100 concurrent requests

Best Practices

1. Error Handling

try:
    response = chat_completion(prompt)
    if response.get('error'):
        print(f"Error: {response['error']['message']}")
    else:
        print(response['choices'][0]['message']['content'])
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

2. Streaming Processing

def stream_chat_completion(prompt, model):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "stream": True
    }
    
    response = requests.post(
        f"{API_BASE}/ai/chat/completions",
        headers=headers,
        json=data,
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data = line[6:]  # Remove 'data: ' prefix
                if data == '[DONE]':
                    break
                try:
                    json_data = json.loads(data)
                    content = json_data['choices'][0]['delta'].get('content', '')
                    if content:
                        print(content, end='', flush=True)
                except json.JSONDecodeError:
                    continue

3. Retry Mechanism

import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def create_session_with_retry():
    session = requests.Session()
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504]
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

# Use retry session
session = create_session_with_retry()
response = session.post(url, headers=headers, json=data)

Support

If you encounter issues during usage, please:

  1. Review the error code descriptions in this document

  2. Check request parameters and authentication information

  3. Contact the technical support team

  4. Check the system status page

Technical Support Email: [email protected] API Status Page: https://status.your-domain.com

Last updated