Inference Wars: Is AMD Ready to Challenge NVIDIA’s AI Dominance or Risk Intel’s Fate?

The battle for AI dominance is about more than just chips—it's about defining the platforms and ecosystems that will shape the future of computing. Can AMD emerge as a true contender, or will they follow Intel’s downward path?

The AI accelerator market, projected to exceed $500 billion in the coming years, stands as one of the most critical opportunities in modern computing. NVIDIA leads this race, driven by its CUDA ecosystem, integrated solutions, and dominance in AI workloads. Meanwhile, AMD, leveraging its MI300 series and strategic acquisitions, aims to emerge as a strong challenger.

Having a single player like NVIDIA controlling over 80% of the market poses significant challenges for the industry. A healthy ecosystem relies on competition and choice. The 2023 NVIDIA chip shortage highlighted the fragility of a one-player market and underscored the importance of AMD's success.

As the industry shifts from training to inference workloads, and we see the rise of reasoning models like DeepSeek R1, AMD seeks to capitalize on this transition. However, overcoming NVIDIA’s entrenched dominance will be challenging, especially with the rise of reasoning models. The next decade will reveal whether AMD’s strategic decisions can secure a significant share of this highly competitive and rapidly evolving market.

Building Blocks of AMD

Founded in 1969 by former Fairchild Semiconductor employees, AMD has grown into the sixth-largest semiconductor firm, valued at $188 billion, with $24 billion in annual revenue and $6 billion in EBITDA. When Dr. Lisa Su became CEO in 2014, AMD faced declining market share and financial struggles. Her leadership drove a remarkable turnaround by emphasizing high-performance computing, strategic acquisitions like Xilinx, and innovation, reshaping AMD's role in the semiconductor industry.

AMD currently has operations that span four core segments:

  1. Client: Desktop and laptop processors, including the popular Ryzen chips.

  2. Data Center: High-performance server CPUs, GPUs, and AI accelerators for enterprise applications.

  3. Gaming: GPUs and custom processors powering gaming PCs, and consoles like Xbox and PS5.

  4. Embedded: Specialized processors for industrial, automotive, and IoT applications.

The Software Challenge: ROCm vs. CUDA

AMD's data center segment drives its largest revenue and offers a key growth opportunity. Since launching the EPYC CPU in 2017, AMD has gained significant traction with major cloud providers. In late 2023, AMD introduced the MI300 series, tailored for AI and HPC workloads. The MI300X, with more high-bandwidth memory (HBM) than NVIDIA’s H100, positions itself as a strong competitor for specific workloads. AMD further strengthened its position by partnering with Meta, announcing in 2024 that Llama 3.1 was optimized for AMD hardware.

AMD has adopted an open-standards approach to give developers flexibility and avoid platform lock-in. In 2016, it launched ROCm (Radeon Open Compute) as an open-source GPU computing platform. While initially designed for general high-performance computing workloads, ROCm has evolved to place particular emphasis on AI training and inference capabilities. ROCm features a low-level software stack consisting of drivers, libraries, and compilers that connects high-level frameworks like PyTorch, TensorFlow, or JAX to AMD GPUs. 

Despite 2.2x growth in its data center segment over the past two years, AMD still lags far behind NVIDIA. In Q3, NVIDIA reported $31 billion in data center revenue compared to AMD's $3.5 billion—a nearly 9x gap—with NVIDIA growing 2x+ as fast. 

A major factor in NVIDIA's dominance over AMD is software. NVIDIA's CUDA sets the industry standard, while AMD's open-source ROCm faces significant usability, stability, and performance issues. SemiAnalysis reports that AMD's public ROCm/PyTorch builds are buggy, outdated, and poorly optimized. Users frequently report crashes, immature kernel optimizations, and compatibility problems.

Training workloads, in particular, suffer due to the immaturity and instability of AMD's RCCL (ROCm Collective Communication Library) compared to NVIDIA's NCCL. RCCL is crucial for coordinating communication across large-scale GPU deployments, such as those involving hundreds or thousands of GPUs. 

SemiAnalysis notes that AMD's theoretical performance is typically accessible only to select VIP customers who receive custom Docker images and direct engineering support to address bugs, optimize settings, and unlock full hardware potential. Most users, however, must invest significant time in kernel tuning and optimization, highlighting AMD's poor out-of-the-box developer experience.

Despite ROCm's challenges, there are reasons for optimism. While its theoretical performance currently benefits only select customers, it rivals NVIDIA in many areas. AMD must focus on broadening access to these advancements. In their earnings call, AMD highlighted ROCm improvements, noting that ROCm 6.2 increased MI300X inference performance by 2.4x since launch. Dylan from SemiAnalysis revealed AMD reached out after his report, arranging a meeting with Lisa Su to discuss his suggestions. He later shared on X that AMD is actively addressing these issues, reflecting strong and responsive leadership.

Shifting from Training to Inference Workloads

The AI industry is shifting its focus from training to inference workloads, which in many cases aligns with AMD’s strengths. Inference tasks often don’t require the advanced interconnect capabilities of NVIDIA’s NVLink for server-to-server communication. AMD’s MI300X excels in workloads where the model fits within a single server, typically with 8 GPUs. This gives AMD an advantage in simpler or smaller models that don’t rely on the RCCL library used in training.

As Waleed from Mako noted, a LLAMA-405B model at BF16 precision fits on a single server with 8 MI300X GPUs, while it requires two servers / Nodes  with 8 NVIDIA H100 GPUs each. At FP8 precision, the model fits on one server with 8 H100 GPUs but needs only 4 MI300X GPUs, giving AMD a clear cost advantage for larger models contained within a single server.

AMD’s edge could diminish in multi-GPU and multi-server scenarios, especially for reasoning models or large-scale inference. NVIDIA’s NVLink excels in such cases, offering high-bandwidth, low-latency interconnectivity critical for splitting and sharing extensive key-value (kv) caches or running models that exceed a single server's memory. Reasoning models, which use methods like chain-of-thought reasoning or multi-path search, require seamless data sharing across GPUs to maintain high throughput and low latency. In contrast, AMD’s PCIe or Infinity Fabric lacks the scalability and efficiency for these tightly coupled tasks. Workloads requiring massive token counts or extended reasoning chains benefit greatly from NVIDIA’s NVLink, for example  NVIDIA NVL72, enables up to 72 GPUs to function cohesively as a single logical unit. This significantly enhances both cost efficiency and model responsiveness, for running reasoning models. 

However recent advancements with the DeepSeek R1 reasoning model challenge the notion that multi-node interconnect is always essential for reasoning model inference. For example by aggressively quantizing key layers, Unsloth AI demonstrated that even a 600B+ parameter model could fit on a single server. They reduced the 671B-parameter model from 720GB to 131GB, enabling it to run on 160GB of VRAM for fast inference.

This development could favor AMD, though it's important to note the r1  model's quality doesn't match the state-of-the-art O1-Pro model. Superior models may still require multi-server setups, where NVIDIA's interconnectivity remains a top choice. However, releases like R1 start to reduce scenarios where NVIDIA's interconnect advantage is critical.

Increasing competition beyond NVIDIA

So far we have spoken about NVIDIA and AMD, but there is increasing risk from Amazon, Google, and Microsoft.  As Sid Pardeshi founder of Blitzy an autonomous software development startup, and former NVIDIA engineer shared: 

"I am more bullish on Amazon and Google than AMD for two key reasons. First, Amazon’s exclusive partnership with Anthropic positions it uniquely, leveraging Trainium for training and Inferentia for inference. Second, companies like Amazon, Google, Microsoft are deeply incentivized to challenge NVIDIA’s dominance. They have their own model services like Bedrock, Vertex, and Azure, enabling them to gather customer usage insights and refine their hardware."

To date, Google TPU is probably the second-best accelerator in the world, offering high-bandwidth interconnects and strong performance over long contexts. However, TPUs are exclusive to Google Cloud Platform (GCP), which limits their appeal and usability in the broader market, and offer AMD a big advantage if they manage to improve the overall developer experience. It will also be interesting to see in the coming years if AMD partnership with Anthropic proves to be more fruitful than AMD partnership with Meta. 

In addition to the Hyperscalers,  AMD faces increasing competition from new inference chip startup specialists such as Groq, Cerebras, and Etched. For a deeper dive on these new players refer to Eric Flannigan Analysis. However with the inference server market projected to reach $133B by 2034, AMD could still see significant growth, even with a small share capture. 

Client Segment: The Promise of AI-Enabled PCs

AMD has long been associated with gaming, powering platforms like the Xbox and PS5. However, gaming revenue has dropped from over 30% to just 7% of total revenues due to the console market's cyclical downturn. Going forward AMD has chosen to focus on mid-to-low-tier gaming and broad use cases rather than the high-end segment. While new console launches and AI-powered gaming may boost growth, gaming is unlikely to remain AMD’s primary driver, especially given their shift away from the high-end market. It remains unclear whether this decision reflects a strategic focus or an inability to compete with NVIDIA’s RTX 50 series.

The client segment, bolstered by AMD’s Ryzen processors, experienced significant growth under Lisa Su's leadership. AMD successfully reduced Intel’s x86 CPU dominance from 83% to 62%, with the client segment now contributing 28% of the AMDs revenues.


The next big wave? AI-enabled PCs. Statista predicts AI-capable PC shipments will rise from 19% in 2024 to 60% by 2027. On-device AI stands as a key feature, positioning AMD to leverage its CPU and GPU strengths and growing momentum against Intel. However, competition is heating up, with ARM aiming for 50% of the Windows PC market within five years, and Qualcomm being a lead of ARM . Despite this, AMD's strong foothold in the PC sector positions it to capitalize on the AI-driven upgrade cycle.

AMD’s AI M&A + Interoperability startup ecosystem 

AMD has consistently leveraged M&A as a core strategy, focusing on expanding its portfolio, integrating hardware and software, and advancing AI capabilities. Through acquisitions in adaptive computing and infrastructure, AMD has strengthened its position as a leading provider of high-performance technology for data centers, AI workloads, and enterprise applications.

Notable Acquisitions:

  • Xilinx ($49B): A leader in adaptive computing, Xilinx’s solutions enhance AMD’s capabilities in data centers, 5G, automotive, and industrial applications, broadening its product portfolio for next-gen workloads.

  • Silo AI ($665M): Europe’s largest private AI lab strengthens AMD’s AI optimization for hardware-focused LLMs, adds 300 AI experts, and boosts its European presence and AI software ecosystem.

  • ZT Systems ($4.9B): Specializes in AI infrastructure, accelerating AMD’s ability to deliver scalable, integrated solutions for hyperscalers and enterprise customers.

  • Mipsology: Develops AI tools like Zebra AI, simplifying inference on AMD CPUs, GPUs, and adaptive SoCs, making AMD hardware more accessible to developers.

  • Nod.ai: Offers compiler-based automation and SHARK technology to optimize AI models across AMD platforms, aligning with its goal to simplify AI development and enhance performance.

A growing ecosystem of startups is also emerging around AMD and interoperability, which could become critical to AMD success. For example, Tensorwave provides access to training and inference on AMD GPUs. Other companies like MAKO (a Flybridge portfolio company) are building hardware-aware compilers that leverage deep learning to optimize GPU kernels in real time. These could simplify the process of switching workloads from NVIDIA to AMD with a couple lines of code. Waleed the founder of Mako shared:

"Although it's no secret that AMD's GPU libraries are behind compared to Nvidia's, we have been pleasantly surprised by their progress. We see competitive performance on AMD for a broad set of AI inference workloads after tuning and optimization. Off-the-shelf performance still needs work. And yet, a healthy future for AI compute is shaping up with AMD, Google TPU, and AWS Trainium all becoming viable solutions to the AI market."

Final Thoughts: Can AMD Narrow the Gap?

AMD faces a significant challenge, with its software often plagued by bugs and subpar performance compared to NVIDIA's gold-standard CUDA. However, there are reasons for optimism. Leadership is listening to customers and focusing on the right priorities. While progress will take time, AMD is likely to close the gap gradually. The company claims its upcoming MI350 series will offer a 35x improvement in AI inference performance over previous models. Though past benchmarks suggest caution, this still provides some reason for excitement.

Despite the challenges, I remain optimistic about AMD’s trajectory. I expect AMD to improve some of its software issues and benefit as more enterprises deploy AI applications, driving significant growth in the inference market. I expect AMD stock to outperform the S&P 500 over the next five years. However, I don’t see AMD overtaking NVIDIA’s leading position. Beyond CUDA, NVIDIA is expanding into enterprise software with tools like NVIDIA NIM, NeMo, and Agentic AI Blueprints. NVIDIA appears focused on the next wave of innovation, while AMD seems to be playing catch-up. Furthermore, as advanced reasoning models evolve, NVIDIA's superior server-to-server performance will likely position it as the preferred choice for running powerful, resource-intensive reasoning models. However, with the r1 release, the need for multi-server setups for many use cases got diminished.

The semiconductor industry is entering an exciting phase fueled by continuous innovation. Success hinges on sustained innovation, and not just a single chip breakthrough. AMD must shift part of its culture to focus on delivering an outstanding out-of-the-box experience and improve its ROCm software. Startups and companies should embrace AMD's efforts, as its competition will drive better performance and lower costs for consumers.

The key question remains: Can AMD challenge NVIDIA without falling into the same pitfalls as Intel?

If you are a founder leveraging AMD GPUs i would love to hear about your experience danie@flybridge.com 

Previous
Previous

Flybridge's Latest Superpower – Dorothy Chang Joins the Team 🗽

Next
Next

Behind the Curtain: Unveiling our AI-Powered Investment Memo Generator