Cerebras Provider
Cerebras high-performance inference with ultra-fast throughput
Available Models
GLM-4.7
glm
glm-4.7Streaming
Tools
Reasoning
JSON Output
Cerebras
Context: 200k
Input
$2.25
/M tokens
Cached
—
/M tokens
Output
$2.75
/M tokens
GLM-4.6
glmModel Deactivated
glm-4.6Streaming
Tools
Reasoning
JSON Output
Cerebras
Context: 200k
Deactivated since Jan 20, 2026
Input
$2.25
/M tokens
Cached
—
/M tokens
Output
$2.75
/M tokens
GPT OSS 120B
openai
gpt-oss-120bStreaming
Tools
Reasoning
JSON Output
Cerebras
Context: 131.1k
Input
$0.35
/M tokens
Cached
—
/M tokens
Output
$0.75
/M tokens
Qwen3 235B A22B Instruct 2507
alibaba
qwen3-235b-a22b-instruct-2507Streaming
Tools
JSON Output
Cerebras
Context: 262k
Input
$0.6
/M tokens
Cached
—
/M tokens
Output
$1.2
/M tokens
Qwen3 32B
alibabaModel Deactivated
qwen3-32bStreaming
Tools
JSON Output
Cerebras
Context: 32.8k
Deactivated since Feb 16, 2026
Input
$0.4
/M tokens
Cached
—
/M tokens
Output
$0.8
/M tokens
Llama 3.3 70B Instruct
meta
llama-3.3-70b-instructStreaming
Tools
JSON Output
Cerebras
Context: 128k
Input
$0.85
/M tokens
Cached
—
/M tokens
Output
$1.2
/M tokens
Llama 3.1 8B Instruct
meta
llama-3.1-8b-instructStreaming
JSON Output
Cerebras
Context: 128k
Input
$0.1
/M tokens
Cached
—
/M tokens
Output
$0.1
/M tokens