3 Steps, Zero Cost: 2000 tokens/s Coding Backend for Claude Code (Cerebras × Cloudflare)

Let’s be blunt: this hack pays for itself. You can get the “world’s fastest” coding model behind Claude Code without paying subscription fees. The idea is simple—deploy a gateway via run.claw.cloud, back it with Cerebras’ free credits, and expose it as an OpenAI-compatible endpoint so your tools work out of the box.

Why do this? Speed and cost. Cerebras brings the inference horsepower—it’s like plugging into an ultra-fast supercharger. Claude Code provides the driving experience. Give it a clean, fast fuel line, and it flies. What used to be expensive or slow can now be both quick and affordable.

No deep Node knowledge required. If you can copy-paste in a terminal, you’re good.

Environment setup (Claude Code + Cerebras + Cloudflare Workers)

On macOS, the easiest way to install Node is Homebrew:

brew install node
npm install -g @anthropic-ai/claude-code

On Windows, install Ubuntu via WSL, then install Node and the CLI inside Ubuntu:

sudo apt update && sudo apt install -y nodejs npm
npm install -g @anthropic-ai/claude-code

Three steps to an OpenAI-compatible route (run.claw.cloud + Cerebras)

Step one: sign up at cerebras.ai and generate a free API key from the console. It’s OpenAI-compatible and includes the Qwen-3-Coder-480B family for high-speed coding.

Rate limits: up to 30 requests/min and 900 requests/hour.

Step two: open https://console.run.claw.cloud/signin?link=IY4OLFYXS3WY, log in, search the app store for new-api, and deploy it with one click. You’ll get a domain like your-app.run.claw.cloud. On first visit, set the admin username and password (some environments cap username length—shorter is safer).

Step three: in your new admin panel, go to Channel Management. Add a channel, any name you like, type = OpenAI. Base URL = https://api.cerebras.ai. Key = the API key you just created. Save and enable it. From now on, any OpenAI-style request through your gateway will be securely forwarded to Cerebras.

Optional: deploy claude-worker-proxy (extra speed and stability)

Register Cloudflare and install Wrangler: https://developers.cloudflare.com/workers/wrangler
Clone the open-source project: https://github.com/glidea/claude-worker-proxy

git clone https://github.com/glidea/claude-worker-proxy
cd claude-worker-proxy
npm install
wrangler login

Run npm run deploycf and note your Worker URL: https://claude-worker-proxy.your-subdomain.workers.dev.

Wire up Claude Code (custom OpenAI-compatible endpoint)

Claude Code can point to any OpenAI-compatible API. Just set the base URL to your gateway and keep working as usual—it’s like adding a high-grade water filter: Claude Code doesn’t care where the water comes from, as long as it’s clean and pressurized.

~/.claude/settings.json

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://claude-worker-proxy.your-subdomain.workers.dev/openai/https://your-app.run.claw.cloud/v1",
    "ANTHROPIC_API_KEY": "sk-your-new-api-token",
    "ANTHROPIC_MODEL": "qwen-3-coder-480b",
    "ANTHROPIC_SMALL_FAST_MODEL": "gpt-oss-120b",
    "API_TIMEOUT_MS": "300000"
  }
}

If you want to smoke-test the route from the terminal, try a standard Chat Completions call. Replace domain, key, and model as needed:

curl https://your-app.run.claw.cloud/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY_OR_CEREBRAS_KEY" \
  -d '{
    "model": "qwen-3-coder-480b",
    "messages": [
      {"role": "user", "content": "Write a Node script to list a directory with proper error handling"}
    ]
  }'

If you get a valid JSON response, your route is good. Use the same base URL and key in your editor or the Claude Code CLI and you’re set.

Troubleshooting and performance tips

401/403 typically means a wrong key, missing /v1 in the URL, or a misspelled model. Fix those first. For long generations, raise client timeouts a bit. Ramp up concurrency gradually—treat it like warming up an engine, not flooring it from a cold start.

Why this setup is worth it

You get Claude Code’s ergonomics plus Cerebras’s inference speed. Cost-wise, you shift “pay per call” into sensible free-credit usage—plenty for day-to-day coding. The bigger win: it’s a swappable fuel line. Want OpenRouter later? Another OpenAI-compatible service? Same wiring, no tooling rewrite.

Follow 梦兽编程微信公众号，解锁更多黑科技

Environment setup (Claude Code + Cerebras + Cloudflare Workers)#

Three steps to an OpenAI-compatible route (run.claw.cloud + Cerebras)#

Optional: deploy claude-worker-proxy (extra speed and stability)#

Wire up Claude Code (custom OpenAI-compatible endpoint)#

~/.claude/settings.json#

Troubleshooting and performance tips#

Why this setup is worth it#