On the Shifting Global Compute Landscape

Community Article Published October 29, 2025

Summary

The status quo of AI chip usage, that was once almost entirely U.S.-based, is changing. China’s immense progress in open-weight AI development is now being met with rapid domestic AI chip development. In the past few months, highly performant open-weight AI models’ inference in China has started to be powered by chips such as Huawei’s Ascend and Cambricon, with some models starting to be trained using domestic chips. 

There are two large implications for policymakers and AI researchers and developers respectively: U.S. export controls correlates with expedited Chinese chip production, and chip scarcity in China likely incentivized many of the innovations that are open-sourced and shaping global AI development.

China’s chip development correlates highly with stronger export controls from the U.S. Under uncertainty of chip access, Chinese companies have innovated with both chip production and algorithmic advances for compute efficiency in models. Out of necessity, decreased reliance on NVIDIA has led to domestic full stack AI deployments, as seen with Alibaba. 

Compute limitations likely incentivized advancements architecturally, infrastructurally, and in training. Innovations in compute efficiency from open-weight leaders include DeepSeek’s introduction of Multi-head Latent Attention (MLA) and Group Relative Policy Optimization (GRPO). A culture of openness encouraged knowledge sharing and improvements in compute efficiency contributed to lower inference costs, evolving the AI economy. 

Domestic silicon’s proven sufficiency has sparked demand and models are beginning to be optimized for domestic chips. In parallel, software platforms are shifting as alternatives to NVIDIA’s CUDA emerge and challenge NVIDIA at every layer; synergy between AI developers and chip vendors are creating a new, fast-evolving software ecosystem.

The shifting global compute landscape will continue to shape open source, training, deployment, and the overall AI ecosystem.

The State of Global Compute

Utility of and demand for advanced AI chips has followed an upward trajectory and is predicted to continue to increase. Over the past few years all NVIDIA chips maintained dominance. Recently, new players are garnering attention. China has had long-term plans for domestic production, with plans for self-sufficiency and large monetary and infrastructural investments. Now, the next generation of Chinese open-weight AI models are starting to be powered by Chinese chips.

Broader trends worldwide are intensifying, with both the U.S. and China citing national security in chip and rare earth resource restrictions. As U.S. export controls tightened, the rollout of Chinese-produced chips seemingly accelerated. The rise of China’s domestic chip industry is fundamentally changing norms and expectations for global AI training and deployment, with more models being optimized for Chinese hardware and compute-efficient open-weight models picking up in adoption. In the last few months, Chinese-produced chips have already started to power inference for popular models and are beginning to power training runs. 

The changes can affect everything from techniques used in training, to optimizing for both compute efficiency and specific hardware, to lower inference costs, to the recent open source boom. This could shift both U.S. trade policy and China's approach to global deployment, leading to a future of AI Advancements from an American-focused global ecosystem to one where China is at the center. 

The Beginning of a Rewiring 

China’s domestic chip production has been in progress for years before the modern AI boom. One of the most notable advanced chips, Huawei’s Ascend, initially launched in 2018 but expanded in deployment starting in 2024 and increasingly throughout 2025. Other notable chips include Cambricon Technologies and Baidu’s Kunlun.

In 2022, the Biden administration established export controls on advanced AI chips, a move targeting China's access to high-end GPUs. The strategy was intended to curb the supply of high-end NVIDIA GPUs, stalling China’s AI progress. Yet, what began as a blockade has paradoxically become a catalyst. The intent to build a wall instead laid the foundation for a burgeoning industry.

Chinese AI labs, initially spurred by a fear of being cut off, have responded with a surge of innovation, producing both world-class open-weight models like Qwen, DeepSeek, GLM, and Kimi, and domestic chips that are increasingly powering both training and inference for those models. There is a growing relationship between chip makers and open source, as the ability to locally run open-weight models also leads to mutually beneficial feedback. This is leading to e.g. more Ascend-optimized models.

China’s advancements in both open source and compute are shifting the global landscape. Martin Casado, partner at a16z, noted that a significant portion of U.S. startups are now building on open-weight Chinese models, and a recent analysis shows Chinese open-weight models leading in popularity on LMArena

The vacuum created by the restrictions has ignited a full-stack domestic effort in China, transforming once-sidelined local chipmakers into critical national assets and fostering intense collaboration between chipmakers and researchers to build a viable non-NVIDIA ecosystem. This is no longer a hypothetical scenario; with giants like Baidu and Ant Group successfully training foundation models on domestic hardware, a parallel AI infrastructure is rapidly materializing, directly challenging NVIDIA’s greatest advantage: its developer-centric software ecosystem. 

See the Appendix for a detailed timeline of chip controls and effects on hardware development and deployment.

The Reaction: Powering Chinese AI

The 2022 ban, coinciding with the global shockwave of ChatGPT, triggered a panic across China's tech landscape. The safe default of abundant NVIDIA compute was gone. Claims of smuggling NVIDIA chips arose. Still, the ban had destroyed the trust from the research community, who, faced with the prospect of being left permanently behind, started to innovate out of necessity. What emerged was a new, pragmatic philosophy where a “non-NVIDIA first” approach became rational, not merely ideological.

How China’s Compute Landscape Catalyzed the Cambrian Explosion of Open Models

Chinese labs took a different path, focusing on architectural efficiency and open collaboration. Open source, once a niche interest, became the new norm, a pragmatic choice for rapidly accelerating progress through shared knowledge. This paradigm allows organizations to leverage existing, high-quality pre-trained models as a foundation for specialised applications through post-training, dramatically reducing the compute burden. A primary example is the DeepSeek R1 model, which required less than $300,000 for post-training on its V3 architecture, thereby lowering the barrier for companies to develop sophisticated models. While not the full base model, the cost reduction for the reasoning model is substantial. Algorithmic advances that improve memory such as Multi-head Latent Attention (MLA) with DeepSeek’s V3 model, likely incentivized by compute limitations, are a large part of January 2025’s “DeepSeek moment”.

That moment also catalyzed a larger movement for Chinese companies, including those that were closed-source, to upend strategies and invest in compute-efficient open-weight models. These models’ lower costs could result from many variables and also are influenced by efficiency; as Chinese companies lowered compute and inference costs, they passed those lower costs to users, further evolving the overall AI economy. 

  • DeepSeek's (Open) Weight: In addition to high performance and low cost that created waves in early 2025, DeepSeek’s pioneering as an openly compute-efficient frontier lab is a large part of what has made the company and its models mainstays. These advances can likely be attributed to innovating in a compute-scarce environment. Funded by investor Wenfeng Liang with a "pure pursuit of open source and AGI," DeepSeek became the most-followed organization on Hugging Face. Its highly detailed technical papers, including a groundbreaking _Nature_-published study on its R1 model, set a new standard for scientific communication. While a large draw is its open-weights over its API, in 2024, DeepSeek slashed its API prices to 1/30th of OpenAI's, triggering a price war. In 2025, DeepSeek-OCR further proved their prowess in compute efficiency and with the release of DeepSeek-V3.2-Exp, they passed on a further 50%+ discount to the users. Notably, DeepSeek’s-V3.2-Exp model was also released with day zero support for deploying on Chinese chips (Huawei’s Ascend and Cambricon). This release also marks emphasis on CUDA alternatives and exemplifies a full-stack hardware-software AI infrastructure in deployment.

  • Qwen's Ecosystem Dominance: Alibaba is on a path to control a full stack of high performance models and in-house designed chips, reducing reliance on NVIDIA. The company’s Qwen family became a primary resource for global open-source research. Its permissive Apache 2.0 license enabled commercial use, which was a barrier to comparable models that often used more restrictive customs licenses, leading to over 100,000 derivative models on Hugging Face. Alibaba recently unveiled improved chips for better inference, with its PPU being integrated into domestic infrastructure projects.

  • An Industry-Wide Tidal Wave of Low-Cost, High Efficiency:  More open-weight models released boasting SotA performance with significantly lower pricing. Zhipu AI returned with its GLM-4.5 and 4.6 open-weight releases, with both quickly reaching top trending on Hugging Face and 4.6 becoming the top performing open-weight model on LMArena. GLM’s API pricing continually lowered, boasting cost-effectiveness that even offered a $3/month plan as an alternative to Claude Code at 1/5 of the price. While full transparency on the pricing decisions is unclear, efficiency likely plays a strong role.

  • Seeds of Training Fully on Domestic Chips: While many upcoming chips are designed primarily for inference, more models are hinting at being trained on domestic chips. Ant Group pioneered training its Ling model on complex heterogeneous clusters of NVIDIA, Ascend, and Cambricon chips.  Baidu successfully conducted continuous pre-training on a cluster of over 5,000 domestic Kunlun P800 accelerators, producing its Qianfan VL model.

 

Advances in Compute-Constrained Environments Pushing the Technical Frontier

The innovation was not confined to model weights alone; it went deep into the software and hardware stack.

  • Architectural Exploration: Grassroots independent researchers such as Peng Bo, have championed Linear Attention as a potential successor to the Transformer. This approach, sometimes dubbed the "revenge of the RNN" and seen in models like RWKV, has been scaled into commercial grade models like MiniMax M1 and Qwen-Next by Chinese labs who willingly bet on high-risk, high-reward research. Meanwhile, DeepSeek has taken a different path by iterating on the original Transformer architecture. Their work introduces innovations like Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) introduced with its v3.2 model, which are designed to significantly reduce computational costs during inference without sacrificing performance, while also accelerating Reinforcement Learning (RL) exploration through faster rollouts. Highly performant proprietary models architectures are not public and are therefore difficult to compare.

  • Open Infrastructure: In a radical departure from corporate secrecy, labs shared their deepest engineering secrets. The Kimi team's work on the Mooncake serving system formalized prefill/decoding disaggregation. StepFun's Step3 enhanced this with Attention-FFN Disaggregation (AFD). Baidu published detailed technical reports on overcoming engineering challenges in its Ernie 4 training, while ByteDance's Volcengine contributed verl, an open-source library that puts production-grade RL training tools into the community's hands. What was once proprietary know-how became community knowledge, fueling a self-iterating flywheel of progress.

  • Training breakthroughs: DeepSeek’s DeepSeekMath paper introduced a novel reinforcement learning (RL) methodology, Group Relative Policy Optimization (GRPO), that significantly reduces compute costs compared to prior similar methods Proximal Policy Optimization (PPO) while stabilizing training and even higher accuracy. GRPO has since been featured in a DeepLearning.AI course, built on by Meta’s researchers in their Code World Model, and lauded as having "in a large way accelerated RL research program of most US research labs" by OpenAI research lead Jerry Tworek.

With all the work aggregated, on public leaderboards like LMSYS's Chatbot Arena, models like DeepSeek R1, Kimi K2, Qwen and GLM-4.6 now frequently appear near the top alongside U.S. models. Innovation under constraints resulted in leaps.

The Aftermath: Hardware, Software and Soft Power

When AI models are trained and deployed, they are often optimized for certain types of chips. More than the hardware itself, NVIDIA’s software universe has been a reliable friend to the global AI ecosystem. 

The deep-learning revolution, sparked by AlexNet's 2012 victory on NVIDIA GPUs, created a symbiotic relationship. NVIDIA’s Compute Unified Device Architecture (CUDA), cuDNN, and Collective Communications Library (NCCL) has long formed the bedrock of AI research. An entire ecosystem, including popular frameworks like PyTorch and Hugging Face transformers were heavily optimized on CUDA. An entire generation of developers grew up inside this ecosystem which created enormous switching costs.

A software ecosystem reluctant to switch from existing platforms are now exploring elsewhere, which could be the first step away from U.S. reliance. The software side has evolved with the rise of new chips; developers are optimizing for and deploying their latest models on new parallel platforms.

From Sufficient to Demanded

Prior to 2022, domestic chips from companies like Cambricon and Huawei (Ascend) were rarely treated seriously. They were catapulted to the center of the domestic AI ecosystem in 2025 when SiliconFlow first demonstrated DeepSeek's R1 model running seamlessly on Huawei's Ascend cloud a couple weeks after the R1 release. This created a domino effect, sparking a market-wide race to serve domestic models faster and better on domestic chips.Fueled by the entire ecosystem and not just DeepSeek alone, the Ascend's support matrix quickly expanded. This proved domestic silicon was sufficient and ignited massive demand. Notably, Huawei's Ascend had zero-day integration with the release of DeepSeek v3.2–a level of collaboration previously unimaginable.

Domestic Synergy

Researchers began co-developing with domestic chip vendors, providing direct input and solving problems collaboratively. This synergy creates a development ecosystem tailored for Large Language Models (LLMs) that evolves much faster than NVIDIA’s CUDA. 

A new generation of younger researchers, trained in this multi-vendor world, emerged without the old biases that domestic hardware is inferior to Nvidia's chips. This collaborative approach has already resulted in adoption. The documentation for the DeepSeek-V3.1 model noting that its new FP8 precision format explicitly aims “for next-gen domestic chips,” a clear example of hardware-aware model co-design. Its successor, DeepSeek-V3.2, took this principle further by baking in TileLang-based kernels designed for portability across multiple hardware vendors.

A New Software Landscape

The CUDA ecosystem is now being challenged at every layer. Open-source projects like FlagGems from BAAI and TileLang are creating backend-neutral alternatives to CUDA and cuDNN. Communication stacks like Huawei Collective Communication Library (HCCL) and others are providing robust substitutes for NCCL. The ecosystem is substantially different from three years ago, which will have future reverberations globally.

Looking Ahead

Adaptations to geopolitical negotiations, resource limitations, and cultural preferences have led to leaps in both China’s development of highly performant AI and now competitive domestic chips. U.S. policy has changed throughout administrations, from prohibition to a revenue-sharing model, while China responds with a combination of industrial policy and international trade law. Researchers and developers have innovated and adjusted. The effects on open source, training, and deployment point to shifts in software dependencies, compute efficiency innovations that shape development globally, and a self-sufficient Chinese AI ecosystem.

China’s domestic AI ecosystem is accelerating, with companies like Moore Threads, MetaX, and Biren racing toward IPOs. Cambricon, once struggling, has seen its valuation soar.  This new chip ecosystem’s expansion globally is yet to be decided. 

The future of the global chip ecosystem, and therefore the future of AI progress, has become a key item for upcoming leadership talks. The question is no longer if China can build its own ecosystem, but how far it will go. 

Acknowledgements 

Thank you to Adina Yakefu, Nathan Lambert, Matt Sheehan, and Scott Singer for their feedback on earlier drafts. Any errors remain the authors’ responsibility. 

Appendix: A Timeline of Chip Usage and Controls 

Before 2022, U.S. restrictions were targeted toward specific supercomputing entities. Policy then evolved as regulators and industry adapted.

  • The Initial Moves (October 2022): 

    • Chips such as Ascend are nascent while NVIDIA dominates the global and Chinese market. 

    • The Commerce Department’s Bureau of Industry and Security (BIS) released its "advanced computing" controls in order to address U.S. national security and foreign policy concerns. The rule established a compute threshold with an interconnect-bandwidth trigger, immediately cutting off China's access to NVIDIA’s flagship A100 and H100 GPUs. China promptly filed a WTO dispute (DS615), arguing the measures were discriminatory trade barriers.

  • The Adjustment Era (Late 2022–2023):

    • NVIDIA’s 95% share of the market in China began to quickly drop

    • NVIDIA started to develop compliant variants for the Chinese market. The A800 (November 2022) and H800 (March 2023) were created with reduced chip-to-chip bandwidth to meet regulatory requirements and serve as alternatives to the A100 and H100s. The immensely popular consumer-grade RTX 4090 was also restricted, prompting the creation of a China-specific RTX 4090D.

  • Closing Gaps (Late 2023–2024): 

    • Performance in Chinese domestic chips slowly improves.

    • BIS comprehensively upgraded the framework. It removed interconnect bandwidth as a key test and introduced new metrics: Total Processing Performance (TPP) and performance density. This was a direct, successful strike against the A800/H800s. Debates expanded on export controls for the H20 and even model weights.

     

  • Shifting the Narrative (2025): 

    • Adoption of Ascend, Cambricon, and Kunlun sharply increases following January’s “DeepSeek moment”. 

    • Also in January, the Biden Administration established its AI Diffusion Rule, imposing further restrictions for both chips and select model weights amid security and smuggling concerns. In response, NVIDIA designed a new compliant chip, the H20. Leveraging NVIDIA’s increasing presence in political spheres, NVIDIA CEO Jensen Huang began publicly explaining the strategic importance of selling U.S. chips worldwide. The U.S. then issued a licensing requirement in April 2025, charging NVIDIA $5.5 billion and effectively halting sales, before rescinding the AI Diffusion Rule in May 2025.

  • The Compromise (August 2025)

    • Alibaba announces a new chip for inference.

    • After intense negotiations, the Commerce Department began issuing licenses for the H20 with an unprecedented 15% revenue-sharing arrangement. But by the time the H20 was unbanned, the market had already started to change.

  • China’s Response (Late 2025): 

    • Day zero deployment begins for Ascend and Cambricon among new DeepSeek models.

    • As the U.S. shifted to a revenue-sharing model, Beijing responded. Chinese regulators reportedly instructed firms to cancel NVIDIA orders, steering demand toward domestic accelerators under a "secure supply at home" narrative. This was followed by an anti-discrimination investigation into U.S. measures and an anti-dumping probe into U.S. analog ICs, centering chips in future leadership talks.

Community

Fascinating!

Congrats, excellent analysis. Europe is also stepping up, in a less organized way and with more difficulties. But we are all seeing the same reality: US world domination on compute will fade in the coming decades. Time to short Nvidia :)

·
Article author

There's a lot of space for what is considered sovereign AI! Great article from colleagues: https://huggingface.co/blog/frimelle/sovereignty-and-open-source
On domination, open question. We hope our sharing our observational analyses will provide more needed context for action.

Sign up or log in to comment