API Reference Documentation
This document describes the OpenAI-compatible AI service interfaces, including model management, chat completion, and text completion functionality.
Overview
Our AI API implements OpenAI-compatible interfaces that support:
Model listing and querying
Chat completion (streaming and non-streaming)
Text completion (streaming and non-streaming)
Administrator model management
Basic Information
Base URL:
https://apis.gradient.network/api/v1
Authentication: Access Key
Content Type:
application/json
API Version: v1
Authentication
Access Key Authentication
Authorization: Bearer your-access-key-here
API Endpoints
1. Model Management APIs
1.1 List All Models
Endpoint: GET /ai/models
Description: List of all available AI models (no authentication required)
Request Parameters: None
Response Example:
{
"object": "list",
"data": [
{
"id": "qwen/qwen3-coder-480b-instruct-fp8",
"object": "model",
"created": 1640995200,
"owned_by": "qwen",
"permission": [],
"root": "qwen/qwen3-coder-480b-instruct-fp8",
"parent": null
}
]
}
Error Codes:
200
: Success500
: Internal Server Error
2. Chat Completion API
2.1 Chat Completion
Endpoint: POST /ai/chat/completions
Description: Create a chat completion request, supporting both streaming and non-streaming responses
Authentication: Access Key
Request Parameters:
model
string
Yes
The ID of the model to use
messages
array
Yes
Array of conversation messages
stream
boolean
No
Whether to use streaming response, default false
max_tokens
integer
No
Maximum number of tokens to generate
temperature
number
No
Sampling temperature, 0-2, default 1
top_p
number
No
Nucleus sampling parameter, 0-1, default 1
n
integer
No
Number of responses to generate, default 1
stop
string/array
No
Stop generation tokens
presence_penalty
number
No
Presence penalty, -2.0 to 2.0, default 0
frequency_penalty
number
No
Frequency penalty, -2.0 to 2.0, default 0
logit_bias
object
No
Modify sampling probability for specified tokens
user
string
No
User identifier
Request Example:
{
"model": "qwen/qwen3-coder-480b-instruct-fp8",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7,
"max_tokens": 100
}
Non-streaming Response Example:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "qwen/qwen3-coder-480b-instruct-fp8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Streaming Response Example:
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"qwen/qwen3-coder-480b-instruct-fp8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]
Error Codes:
200
: Success400
: Bad Request401
: Unauthorized402
: Billing Check Failed404
: Model Not Found429
: Rate Limit Exceeded500
: Internal Server Error
3. Text Completion API
3.1 Text Completion
Endpoint: POST /ai/completions
Description: Create a text completion request, supporting both streaming and non-streaming responses
Authentication: Access Key
Request Parameters:
model
string
Yes
The ID of the model to use
prompt
string/array
Yes
Prompt text
suffix
string
No
Suffix to append after inserted text
max_tokens
integer
No
Maximum number of tokens to generate
temperature
number
No
Sampling temperature, 0-2, default 1
top_p
number
No
Nucleus sampling parameter, 0-1, default 1
n
integer
No
Number of responses to generate, default 1
stream
boolean
No
Whether to use streaming response, default false
logprobs
integer
No
Return log probabilities for most likely tokens
echo
boolean
No
Whether to echo the prompt, default false
stop
string/array
No
Stop generation tokens
presence_penalty
number
No
Presence penalty, -2.0 to 2.0, default 0
frequency_penalty
number
No
Frequency penalty, -2.0 to 2.0, default 0
best_of
integer
No
Select from best candidates, default 1
logit_bias
object
No
Modify sampling probability for specified tokens
user
string
No
User identifier
Request Example:
{
"model": "qwen/qwen3-coder-480b-instruct-fp8",
"prompt": "Complete this sentence: The quick brown fox",
"max_tokens": 20,
"temperature": 0.5
}
Response Example:
{
"id": "cmpl-123",
"object": "text_completion",
"created": 1677652288,
"model": "qwen/qwen3-coder-480b-instruct-fp8",
"choices": [
{
"text": " jumps over the lazy dog",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 7,
"total_tokens": 12
}
}
Error Codes:
200
: Success400
: Bad Request401
: Unauthorized402
: Billing Check Failed404
: Model Not Found429
: Rate Limit Exceeded500
: Internal Server Error
Error Code Details
Common Error Codes
400
400
Bad Request
Check request parameter format and required fields
401
401
Unauthorized
Provide valid Access Key or JWT Token
402
402
Billing Check Failed
Check account balance and billing status
403
403
Forbidden
Confirm user permissions and roles
404
404
Resource Not Found
Check if resource ID is correct
429
429
Rate Limit Exceeded
Reduce request frequency or contact administrator
500
500
Internal Server Error
Contact technical support
AI-Specific Error Codes
model_not_found
Specified model does not exist
Check model ID or use /ai/models
to get available models
model_not_supported
Model does not support requested functionality
Check model capabilities or use other models
context_length_exceeded
Input exceeds model context length limit
Reduce input length or use models supporting longer context
invalid_parameters
Parameter values are invalid
Check parameter ranges and formats
billing_check_failed
Billing check failed
Check account balance and billing configuration
Usage Examples
Python Examples
import requests
import json
# Configuration
API_BASE = "https://apis.gradient.network/api/v1"
API_KEY = "your-access-key-here"
# Chat completion request
def chat_completion(prompt, model="qwen/qwen3-coder-480b-instruct-fp8"):
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
data = {
"model": model,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.7,
"max_tokens": 100
}
response = requests.post(
f"{API_BASE}/ai/chat/completions",
headers=headers,
json=data
)
return response.json()
# Usage example
result = chat_completion("Hello, how are you?")
print(result)
JavaScript Examples
// Configuration
const API_BASE = "https://apis.gradient.network/api/v1";
const API_KEY = "your-access-key-here";
// Chat completion request
async function chatCompletion(prompt, model = "qwen/qwen3-coder-480b-instruct-fp8") {
const response = await fetch(`${API_BASE}/ai/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: model,
messages: [{ role: 'user', content: prompt }],
temperature: 0.7,
max_tokens: 100
})
});
return await response.json();
}
// Usage example
chatCompletion("Hello, how are you?")
.then(result => console.log(result))
.catch(error => console.error(error));
cURL Examples
# Chat completion
curl -X POST "https://apis.gradient.network/api/v1/ai/chat/completions" \
-H "Authorization: Bearer your-access-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-coder-480b-instruct-fp8",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"temperature": 0.7,
"max_tokens": 100
}'
# Get model list
curl "https://apis.gradient.network/api/v1/ai/models"
Rate Limits and Quotas
Rate Limits
Free Users: 60 requests per minute
Paid Users: Based on plan, typically 1000-10000 requests per minute
Token Limits
Input Tokens: Based on model context length limits
Output Tokens: Based on model capabilities and billing limits
Concurrency Limits
Free Users: Maximum 3 concurrent requests
Paid Users: Based on plan, typically 10-100 concurrent requests
Best Practices
1. Error Handling
try:
response = chat_completion(prompt)
if response.get('error'):
print(f"Error: {response['error']['message']}")
else:
print(response['choices'][0]['message']['content'])
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
2. Streaming Processing
def stream_chat_completion(prompt, model):
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
data = {
"model": model,
"messages": [{"role": "user", "content": prompt}],
"stream": True
}
response = requests.post(
f"{API_BASE}/ai/chat/completions",
headers=headers,
json=data,
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:] # Remove 'data: ' prefix
if data == '[DONE]':
break
try:
json_data = json.loads(data)
content = json_data['choices'][0]['delta'].get('content', '')
if content:
print(content, end='', flush=True)
except json.JSONDecodeError:
continue
3. Retry Mechanism
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def create_session_with_retry():
session = requests.Session()
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
# Use retry session
session = create_session_with_retry()
response = session.post(url, headers=headers, json=data)
Support
If you encounter issues during usage, please:
Review the error code descriptions in this document
Check request parameters and authentication information
Contact the technical support team
Check the system status page
Technical Support Email: [email protected] API Status Page: https://status.your-domain.com
Last updated