Key Trends in Semiconductors for 2026
01
The AI hardware pyramid will collapse.
What will take its place next?
The current structure of the AI semiconductor industry is like an inverted pyramid. At the wide end are hundreds of chip companies, billions of dollars in capital investment, and continuous hardware innovation. At the narrow end is a relatively limited number of AI applications that have truly reached the stage of scaled production — applications that are running on all the aforementioned chips.
The 0.2%/50% ratio already makes clear where the industry's focus will shift this year. As Craig puts it:
"A lot of the focus is still on the hardware, rather than on solving real-world problems."
What will replace this pyramid is a more balanced structure: hardware will consolidate around a few dominant software ecosystems, connecting to proven end-application layers in manufacturing, physical AI, healthcare, and other fields. The companies that truly bridge this gap in 2026 will be those that bring to market a software stack capable of running real enterprise workloads — not just solutions that work in the lab. NVIDIA is a prime example.
02
The Next NVIDIA
Will Not Be Defined by the Chip Itself
NVIDIA won the AI chip race not because it had the best chip, but because it had CUDA — good enough, early enough, and sticky enough to become the default underlying platform for AI development worldwide. Ian Baird puts it bluntly: For every company building custom accelerators today, the main obstacle they face is not hardware performance, but software compatibility.
NVIDIA's advantage stems from its GPU architecture, which was built for parallel processing — and that architecture happens to align perfectly with the needs of AI. Every competitor since has had to climb uphill against a software ecosystem that keeps reinforcing itself. Craig Melrose, borrowing Geoffrey Moore's technology adoption curve, explains:
"Hardware is the innovator. What really determines who becomes the early adopter and who reaches the mainstream market is software."
For semiconductor companies developing or adopting new types of accelerators, the truly decisive investments should go into compiler toolchains, kernel libraries, and engineering teams capable of migrating workloads and addressing software gaps.
The players that ultimately survive tend to share several common characteristics:
Deep investment in software
Focus on specific high-value workloads
Proven deployment models
If an accelerator delivers a 30% performance improvement on a specific workload that no one uses in production, it will not become a viable business. But if an accelerator improves efficiency by 20% on video inference tasks at the edge — with a complete software stack and a proven deployment path — then it is a viable business.
03
By 2027, Most AI Inference Will Run at the Edge — But the Software Isn't Ready Yet
The assumption that "AI inference runs by default in the cloud" is being challenged by two forces: the growth of physical AI applications, and the rising energy costs of datacenter compute. Craig puts it clearly:
"Most physical AI will ultimately run at the edge."
Robotics, autonomous vehicles, and factory floor systems all need to perform inference where the action happens. A humanoid robot cannot afford to wait for a round trip to the cloud before grasping an object.
"The real difficulty with edge AI is not getting powerful enough chips to the edge — it's getting the software to run correctly on all that hardware."
But the bottleneck is not the hardware. Today's NPU ecosystem is highly fragmented: AMD, Intel, Qualcomm, Apple, and dozens of other companies have all introduced their own neural processing units, but their architectures and toolchains are incompatible with one another. Ian gets straight to the heart of the problem: the real challenge is figuring out how to develop software that can run efficiently across such a heterogeneous ecosystem — and existing frameworks only offer partial solutions. The hardware needed for edge inference is falling into place, but the software ecosystem to match it is far from ready.
04
In 2026, Chiplet Architecture Will Move from Niche to Mainstream
The era of treating the monolithic GPU as the default computing platform for AI is coming to an end. The diversification of AI workloads is making specialization increasingly cost-effective. Chiplet architecture allows companies to flexibly combine compute, memory, and I/O modules from different sources and different process nodes, enabling customization that was difficult to achieve with past monolithic chip designs.
Craig notes that companies like Modular are entering the hardware space in unconventional ways. They are developing chiplets and modular hardware, and even redefining what it means to be "a chip." Ian also cites D-Matrix as another example: it integrates ultra-low-latency memory with compute, specifically optimized for inference workloads such as video generation and prompt processing. Google's TPU, Microsoft's Maia, and Amazon's Trainium are all essentially betting on the same thing. The hyperscalers saw this years ago. By 2026, the rest of the market will truly begin to catch up.
05
By 2027, Inference Efficiency Will Matter More Than Raw FLOPS
When power becomes the constraint, efficiency becomes the moat. The frontier of inference optimization is increasingly shifting to the software layer.
Today, procurement discussions are already moving from peak compute performance to FLOPS-per-watt, latency-per-query, and cost-per-inference. In 2026 and 2027, the performance gains that truly matter will come increasingly from software-level inference optimizations — such as model distillation, quantization, compiler tuning, and right-sizing models to actual workload demands.
In nearly every real-world production scenario, a well-optimized small model running on an efficient chip will outperform an over-provisioned large model running on a power-hungry chip. Companies that treat inference optimization as a core engineering capability — rather than an afterthought — will have a structural cost advantage that latecomers will find very hard to match.
06
Data Centers Will Face Insufficient Power Supply by 2028
There is a critical risk in AI infrastructure construction that remains underestimated today: the pace of energy supply expansion cannot keep up with the growth rate of AI compute demand. Gas turbines — currently the fastest path to adding new power generation capacity — have their production capacity booked through 2028. Power supply is becoming a hard constraint on data center expansion.
Early localized cases are already emerging. For example, a utility company in Nevada, USA, plans to prioritize power for data centers over existing telecommunications infrastructure needs. This kind of situation shows that this constraint is beginning to move from a paper problem into reality. Craig directly poses the follow-up question:
"What happens if the very data centers that critical systems rely on face an energy gap? If a key production system depends on real-time cloud inference, and that data center has to shed load due to insufficient power, who bears the responsibility?"
This is not a theoretical question — it is an operational reality. Organizations that start building resilience now through edge deployment and workload prioritization will be better positioned as constraints tighten further.
07
In 2026, Physical AI Will Grow Faster Than Data AI
The next wave of AI chip demand will come from AI embedded in physical systems — robots, cars, factories, and consumer devices. Craig believes this trend has already begun: the growth rate of applications that require local, real-time inference has already surpassed that of datacenter‑centric applications, which dominated the previous cycle.
The evidence is already visible in adjacent markets. Smartwatches, portable ECG devices, smart rings, and other consumer devices — as well as industrial automation and autonomous vehicles — all require edge inference, low power consumption, vertically oriented software stacks, and the ability to run reliably in real‑world environments that are completely different from controlled datacenter settings.
And these are precisely the applications that today's software toolchains are least equipped to serve. The companies that ultimately succeed in bringing AI into the physical world will be the first to realize that hardware problems and software problems have never been two separate things.
08
Conclusion
The hardware race in the semiconductor industry is already in full swing, but the final outcome will be decided by software. The accelerators and platforms that survive this shakeout must meet several conditions: they need to connect to proven application scenarios, be backed by a robust software ecosystem, and be designed for a world where edge-first and power-constrained are the operational realities.






