OpenAI Expands, 7 GW Capacity

Edition #194 | 26 September 2025

Sep 26, 2025

Hello People,
Welcome to the new Edition of Business Analytics Review !!

Meta launches an AI short-form video feed “Vibes,” Clarifai ships a new reasoning engine to speed multimodal inference, CoreWeave inks a $6.5B expansion deal with OpenAI.

OpenAI disrupts 20 campaigns to misuse its tech as federal officials mull international use of AI | The Record from Recorded Future News

Today’s Quick Recaps

What happened: OpenAI announced five new U.S. Stargate data-center sites that bring the project’s planned capacity to nearly 7 gigawatts of compute, via partnerships with Oracle and SoftBank. Read More Here

Why it matters: This materially increases the physical infrastructure available for large models, lowering latency and enabling larger, stateful agent deployments at scale — a direct lever on cost, throughput and model placement decisions for data teams. Read More Here

The takeaway: If your workflows depend on large LLM inference, plan for distributed deployment options (regional endpoints + hybrid cloud) and benchmark for throughput (tokens/sec) and cost per 1M tokens now — those metrics will move as Stargate capacity comes online.

Deep Dive - Why Five New Stargate Sites Matter for Data Teams

OpenAI’s announcement (backed by partners Oracle and SoftBank and supported by major infrastructure deals) pushes Stargate to practically match a new class of on-demand compute capacity: ~7 GW planned. This isn’t just PR — it changes where and how you run inference and training: regional endpoints reduce network egress, and dedicated campuses enable lower-latency cross-GPU model serving. Explore More

The Problem: Cloud latency, variable availability, and rising per-token inference costs make productionizing large models (multi-agent systems, real-time retrieval-augmented apps) expensive and unreliable across regions.

The Solution: A coordinated infrastructure build (Stargate) that adds local capacity, combined with orchestration & hybrid deployments to place model shards and caches near users.

Infrastructure Partnership: OpenAI + Oracle + SoftBank — pooled capital and site selection accelerate capacity rollout and provide anchor customers for large campus builds.
Scale & Power: ~7 GW planned capacity means orders-of-magnitude increases in available GPU-hours for large models, which translates into improved throughput and potential cost-per-inference reductions.
Operational Approach: Expect more regional edge endpoints, integrated networking between cloud providers and on-campus caches to minimize cross-region egress for retrieval-augmented pipelines.

The Results Speak for Themselves:

Baseline: Typical cross-region LLM latency: 200–800 ms per request (varies).
After Optimization: Regional colocated endpoints can lower median latency to <50 ms for many interactive tasks (expected as capacity comes online). (Operational delta depends on topology; monitor with synthetic probes.)
Business Impact: Reduced latency and higher throughput directly increase user engagement in interactive apps and cut per-session compute costs — at scale this can mean millions saved annually for large consumer apps (value depends on QPS and model size).

BAR Pro - Subscribe and Get Daily Contents

💵 50% Off All Live Bootcamps and Courses
📬 Daily Business Briefings; All edition themes are different from the other.
📘 1 Free E-book Every Week
🎓 FREE Access to All Webinars & Masterclasses
📊 Exclusive Premium Content

Join now for $11/month

What We’re Testing This Week - Optimizing retrieval-augmented inference for lower latency and cost

Brief intro: With new large campus capacity coming online, practical gains come from smarter placement of retrieval caches and quantized models. We’re testing two approaches:

Sharded Local Serving (VShard) — Break the model into smaller shards and serve hot shards on local GPU pools.
- Practical tip: pin frequently used attention layers to local memory and offload rarely used layers to regional pools; measure tokens/sec changes.
Quantized + Distilled Edge Models — Use INT8/4 quantization and distilled 6B models as front-line responders, fall back to full model for complex queries.
- Practical tip: run A/B latency vs. accuracy on a held-out set; aim for <5% degradation in top-1 accuracy for a 2–4x throughput gain.

Recommended Tools for Today

CoreWeave
GPU cloud for large model training and serving; recently expanded OpenAI pact (large capacity and enterprise procurement options). Reuters

Oracle Cloud Infrastructure (OCI)
Enterprise data center partnership for Stargate sites — strong for colocated campus interconnects and predictable pricing. Data Center Dynamics

Weave/Datadog (monitoring)
Integrate end-to-end latency and token-cost dashboards to measure regional performance and cost per million tokens. DataDog

Quick Poll

3 Things to Know Before Signing Off

• OpenAI/Oracle/SoftBank — Stargate adds five U.S. sites to reach ~7 GW planned capacity; major impact on where you place inference. Read More
• Meta — launched “Vibes,” an AI short-form video feed inside Meta AI to boost AI-generated video creation and distribution. Read More
• CoreWeave — expanded its OpenAI contract (up to $6.5B), strengthening on-demand GPU supply chains. Read More

Follow Us:
LinkedIn | X (formerly Twitter) | Facebook | Instagram

Please like this edition and put up your thoughts in the comments.

EXCLUSIVE LIMITED-TIME OFFER: 50% OFF Newsletter Sponsorships!

Get 50% off on all the prices mentioned below
Actual Sponsorship Prices

Mary Bee

Sep 27

No, just no. AI is a scam. They say they want to build sites for their computers and it’s just an air conditioned warehouse of computers that run day and night sucking all our electricity.

Expand full comment

4 replies