Cerebras Provider

Cerebras high-performance inference with ultra-fast throughput

Available Models

GLM-4.7

glm
glm-4.7
Streaming
Tools
Reasoning
JSON Output
Cerebras
Context: 200k
Input
$2.25
/M tokens
Cached
/M tokens
Output
$2.75
/M tokens

GLM-4.6

glmModel Deactivated
glm-4.6
Streaming
Tools
Reasoning
JSON Output
Cerebras
Context: 200k
Deactivated since Jan 20, 2026
Input
$2.25
/M tokens
Cached
/M tokens
Output
$2.75
/M tokens

GPT OSS 120B

openai
gpt-oss-120b
Streaming
Tools
Reasoning
JSON Output
Cerebras
Context: 131.1k
Input
$0.35
/M tokens
Cached
/M tokens
Output
$0.75
/M tokens

Qwen3 235B A22B Instruct 2507

alibaba
qwen3-235b-a22b-instruct-2507
Streaming
Tools
JSON Output
Cerebras
Context: 262k
Input
$0.6
/M tokens
Cached
/M tokens
Output
$1.2
/M tokens

Qwen3 32B

alibabaModel Deactivated
qwen3-32b
Streaming
Tools
JSON Output
Cerebras
Context: 32.8k
Deactivated since Feb 16, 2026
Input
$0.4
/M tokens
Cached
/M tokens
Output
$0.8
/M tokens

Llama 3.3 70B Instruct

meta
llama-3.3-70b-instruct
Streaming
Tools
JSON Output
Cerebras
Context: 128k
Input
$0.85
/M tokens
Cached
/M tokens
Output
$1.2
/M tokens

Llama 3.1 8B Instruct

meta
llama-3.1-8b-instruct
Streaming
JSON Output
Cerebras
Context: 128k
Input
$0.1
/M tokens
Cached
/M tokens
Output
$0.1
/M tokens