API Reference Documentation

This document describes the OpenAI-compatible AI service interfaces, including model management, chat completion, and text completion functionality.

Overview

Our AI API implements OpenAI-compatible interfaces that support:

Model listing and querying
Chat completion (streaming and non-streaming)
Text completion (streaming and non-streaming)
Administrator model management

Basic Information

Base URL: https://apis.gradient.network/api/v1
Authentication: Access Key
Content Type: application/json
API Version: v1

Authentication

Access Key Authentication

Authorization: Bearer your-access-key-here

API Endpoints

1. Model Management APIs

1.1 List All Models

Endpoint: GET /ai/models

Description: List of all available AI models (no authentication required)

Request Parameters: None

Response Example:

{
  "object": "list",
  "data": [
    {
      "id": "qwen/qwen3-coder-480b-instruct-fp8",
      "object": "model",
      "created": 1640995200,
      "owned_by": "qwen",
      "permission": [],
      "root": "qwen/qwen3-coder-480b-instruct-fp8",
      "parent": null
    }
  ]
}

Error Codes:

200: Success
500: Internal Server Error

2. Chat Completion API

2.1 Chat Completion

Endpoint: POST /ai/chat/completions

Description: Create a chat completion request, supporting both streaming and non-streaming responses

Authentication: Access Key

Request Parameters:

Parameter

Type

Required

Description

model

string

Yes

The ID of the model to use

messages

array

Yes

Array of conversation messages

stream

boolean

Whether to use streaming response, default false

max_tokens

integer

Maximum number of tokens to generate

temperature

number

Sampling temperature, 0-2, default 1

top_p

number

Nucleus sampling parameter, 0-1, default 1

n

integer

Number of responses to generate, default 1

stop

string/array

Stop generation tokens

presence_penalty

number

Presence penalty, -2.0 to 2.0, default 0

frequency_penalty

number

Frequency penalty, -2.0 to 2.0, default 0

logit_bias

object

Modify sampling probability for specified tokens

user

string

User identifier

Request Example:

{
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 100
}

Non-streaming Response Example:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Streaming Response Example:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

Error Codes:

200: Success
400: Bad Request
401: Unauthorized
402: Billing Check Failed
404: Model Not Found
429: Rate Limit Exceeded
500: Internal Server Error

3. Text Completion API

3.1 Text Completion

Endpoint: POST /ai/completions

Description: Create a text completion request, supporting both streaming and non-streaming responses

Authentication: Access Key

Request Parameters:

Parameter

Type

Required

Description

model

string

Yes

The ID of the model to use

prompt

string/array

Yes

Prompt text

suffix

string

Suffix to append after inserted text

max_tokens

integer

Maximum number of tokens to generate

temperature

number

Sampling temperature, 0-2, default 1

top_p

number

Nucleus sampling parameter, 0-1, default 1

n

integer

Number of responses to generate, default 1

stream

boolean

Whether to use streaming response, default false

logprobs

integer

Return log probabilities for most likely tokens

echo

boolean

Whether to echo the prompt, default false

stop

string/array

Stop generation tokens

presence_penalty

number

Presence penalty, -2.0 to 2.0, default 0

frequency_penalty

number

Frequency penalty, -2.0 to 2.0, default 0

best_of

integer

Select from best candidates, default 1

logit_bias

object

Modify sampling probability for specified tokens

user

string

User identifier

Request Example:

{
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "prompt": "Complete this sentence: The quick brown fox",
  "max_tokens": 20,
  "temperature": 0.5
}

Response Example:

{
  "id": "cmpl-123",
  "object": "text_completion",
  "created": 1677652288,
  "model": "qwen/qwen3-coder-480b-instruct-fp8",
  "choices": [
    {
      "text": " jumps over the lazy dog",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12
  }
}

Error Codes:

200: Success
400: Bad Request
401: Unauthorized
402: Billing Check Failed
404: Model Not Found
429: Rate Limit Exceeded
500: Internal Server Error

Error Code Details

Common Error Codes

Error Code

HTTP Status

Description

Solution

400

400

Bad Request

Check request parameter format and required fields

401

401

Unauthorized

Provide valid Access Key or JWT Token

402

402

Billing Check Failed

Check account balance and billing status

403

403

Forbidden

Confirm user permissions and roles

404

404

Resource Not Found

Check if resource ID is correct

429

429

Rate Limit Exceeded

Reduce request frequency or contact administrator

500

500

Internal Server Error

Contact technical support

AI-Specific Error Codes

Error Code

Description

Solution

model_not_found

Specified model does not exist

Check model ID or use /ai/models to get available models

model_not_supported

Model does not support requested functionality

Check model capabilities or use other models

context_length_exceeded

Input exceeds model context length limit

Reduce input length or use models supporting longer context

invalid_parameters

Parameter values are invalid

Check parameter ranges and formats

billing_check_failed

Billing check failed

Check account balance and billing configuration

Usage Examples

Python Examples

import requests
import json

# Configuration
API_BASE = "https://apis.gradient.network/api/v1"
API_KEY = "your-access-key-here"

# Chat completion request
def chat_completion(prompt, model="qwen/qwen3-coder-480b-instruct-fp8"):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.7,
        "max_tokens": 100
    }
    
    response = requests.post(
        f"{API_BASE}/ai/chat/completions",
        headers=headers,
        json=data
    )
    
    return response.json()

# Usage example
result = chat_completion("Hello, how are you?")
print(result)

JavaScript Examples

// Configuration
const API_BASE = "https://apis.gradient.network/api/v1";
const API_KEY = "your-access-key-here";

// Chat completion request
async function chatCompletion(prompt, model = "qwen/qwen3-coder-480b-instruct-fp8") {
    const response = await fetch(`${API_BASE}/ai/chat/completions`, {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${API_KEY}`,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            model: model,
            messages: [{ role: 'user', content: prompt }],
            temperature: 0.7,
            max_tokens: 100
        })
    });
    
    return await response.json();
}

// Usage example
chatCompletion("Hello, how are you?")
    .then(result => console.log(result))
    .catch(error => console.error(error));

cURL Examples

# Chat completion
curl -X POST "https://apis.gradient.network/api/v1/ai/chat/completions" \
  -H "Authorization: Bearer your-access-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-coder-480b-instruct-fp8",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "temperature": 0.7,
    "max_tokens": 100
  }'

# Get model list
curl "https://apis.gradient.network/api/v1/ai/models"

Rate Limits and Quotas

Rate Limits

Free Users: 60 requests per minute
Paid Users: Based on plan, typically 1000-10000 requests per minute

Token Limits

Input Tokens: Based on model context length limits
Output Tokens: Based on model capabilities and billing limits

Concurrency Limits

Free Users: Maximum 3 concurrent requests
Paid Users: Based on plan, typically 10-100 concurrent requests

Best Practices

1. Error Handling

try:
    response = chat_completion(prompt)
    if response.get('error'):
        print(f"Error: {response['error']['message']}")
    else:
        print(response['choices'][0]['message']['content'])
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

2. Streaming Processing

def stream_chat_completion(prompt, model):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "stream": True
    }
    
    response = requests.post(
        f"{API_BASE}/ai/chat/completions",
        headers=headers,
        json=data,
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data = line[6:]  # Remove 'data: ' prefix
                if data == '[DONE]':
                    break
                try:
                    json_data = json.loads(data)
                    content = json_data['choices'][0]['delta'].get('content', '')
                    if content:
                        print(content, end='', flush=True)
                except json.JSONDecodeError:
                    continue

3. Retry Mechanism

import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def create_session_with_retry():
    session = requests.Session()
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504]
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

# Use retry session
session = create_session_with_retry()
response = session.post(url, headers=headers, json=data)

Support

If you encounter issues during usage, please:

Review the error code descriptions in this document
Check request parameters and authentication information
Contact the technical support team
Check the system status page

Technical Support Email: [email protected] API Status Page: https://status.your-domain.com

PreviousGradient Cloud NextUser Experiences

Last updated 1 month ago