AWS and Cerebras Announce March 13 Collaboration to Deliver Ultra‑Fast AI Inference

AMZN
March 14, 2026

AWS and Cerebras Systems announced a partnership on March 13 2026 that will bring a new, disaggregated inference architecture to Amazon Bedrock. The solution combines AWS’s Trainium‑powered servers, Cerebras’s CS‑3 wafer‑scale chips, and Elastic Fabric Adapter networking, enabling the two companies to split inference workloads between the two specialized platforms.

The collaboration is a strategic move that positions AWS to compete more directly with Microsoft Azure and Google Cloud, and to challenge Nvidia’s dominance in the inference market. By integrating the CS‑3 chips into Bedrock, AWS can offer customers a faster, more efficient inference path for generative‑AI and large‑language‑model workloads, potentially reducing latency and cost per token for enterprise users.

The core of the new architecture is an “inference disaggregation” technique that routes the prefill stage of a model to the CS‑3 wafer‑scale chip and the decode stage to Trainium. This split allows each piece of hardware to perform the work it is best suited for, delivering significant speed and efficiency gains over traditional GPU‑only solutions.

Industry analysts view the partnership as a key development in the AI‑cloud race. It expands AWS’s AI infrastructure portfolio, strengthens its competitive edge, and signals a broader shift toward custom silicon and specialized inference hardware across hyperscalers.

"Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real‑time coding assistance and interactive applications. What we’re building with Cerebras solves that: by splitting the inference workload across Trainium and CS‑3, and connecting them with Amazon’s Elastic Fabric Adapter, each system does what it’s best at. The result will be inference that’s an order of magnitude faster and higher performance than what’s available today," said David Brown, Vice President of Compute & ML Services at AWS.

"Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base. Every enterprise around the world will be able to benefit from blisteringly fast inference within their existing AWS environment," added Andrew Feldman, Founder and CEO of Cerebras Systems.

The content on EveryTicker is for informational purposes only and should not be construed as financial or investment advice. We are not financial advisors. Consult with a qualified professional before making any investment decisions. Any actions you take based on information from this site are solely at your own risk.