Cerebras Provider

Cerebras high-performance inference with ultra-fast throughput

Get started Try in Playground Visit company

Available Models

GLM-4.7

glm

glm-4.7

Streaming

Tools

Reasoning

JSON Output

Cerebras

Context: 200k

Input

$2.25

/M tokens

Cached

—

/M tokens

Output

$2.75

/M tokens

GLM-4.6

glmModel Deactivated

glm-4.6

Streaming

Tools

Reasoning

JSON Output

Cerebras

Context: 200k

Deactivated since Jan 20, 2026

Input

$2.25

/M tokens

Cached

—

/M tokens

Output

$2.75

/M tokens

GPT OSS 120B

openai

gpt-oss-120b

Streaming

Tools

Reasoning

JSON Output

Cerebras

Context: 131.1k

Input

$0.35

/M tokens

Cached

—

/M tokens

Output

$0.75

/M tokens

Qwen3 235B A22B Instruct 2507

alibaba

qwen3-235b-a22b-instruct-2507

Streaming

Tools

JSON Output

Cerebras

Context: 262k

Input

$0.6

/M tokens

Cached

—

/M tokens

Output

$1.2

/M tokens

Qwen3 32B

alibabaModel Deactivated

qwen3-32b

Streaming

Tools

JSON Output

Cerebras

Context: 32.8k

Deactivated since Feb 16, 2026

Input

$0.4

/M tokens

Cached

—

/M tokens

Output

$0.8

/M tokens

Llama 3.3 70B Instruct

meta

llama-3.3-70b-instruct

Streaming

Tools

JSON Output

Cerebras

Context: 128k

Input

$0.85

/M tokens

Cached

—

/M tokens

Output

$1.2

/M tokens

Llama 3.1 8B Instruct

meta

llama-3.1-8b-instruct

Streaming

JSON Output

Cerebras

Context: 128k

Input

$0.1

/M tokens

Cached

—

/M tokens

Output

$0.1

/M tokens