The world's largest chip, wants to achieve 4000 times improvement with light interconnection
Cerebras, a developer of wafer-scale processors, is working on an optical subroutine to improve its system performance by 4,000 times and is calling for industry collaboration and standardization.
The current Cerebras WSE3 processor is built on a 300mm wafer with 9 million transistors and a power consumption of 20kW. The California-based company has had to develop its own wafer-scale packages for I/O, power delivery and cooling, and is currently working on optical interconnects.
The company's chief systems architect spoke at Leti Innovation Day in Grenoble, France, this week, exploring how to address scalability challenges with small chips and 3D heterogeneous packaging technology. "It's not a small chip, but it's still a candidate for 3D integration," said JP Fricker, Cerebras co-founder and chief system architect. "This technology will be transformative."
However, a key limitation for performance, scalability, and power consumption is off-chip I/O.
"I/O is a limitation of large computing that prevents you from getting into very large systems. These technologies exist today, but we need to invent technologies to put them together. We are developing these technologies and our goal is to build supercomputers that are 4,000 times faster than today and connect 1,000 wafers together."
"Currently, I/O is located on both edges of the chip, but it works better if I/O is distributed across the chip. Shortening the channel length reduces the size of the SERDES, saving space and power."
"We want to have a lot of optical engines," he said. "Right now they're external, but eventually we'll put these lasers in a chip." These will be used for multiple communication channels with reasonable data rates of 100 to 200Gbit/s, rather than thick pipes, he said.
"We have our own wafer-level engine and take third-party wafer-level programmable optical interconnects and put them together, using the entire surface of the wafer to connect to the wafer," he said. "It requires a heterogeneous wafer-to-wafer package."
Companies such as Celestia AI and LightMatter have been developing these optical interconnect technologies, especially for hyperscale and AI chip companies.
"But we need to invent or repurpose technology. The current interconnect pitch is too thick and we can't get fabs willing to integrate the technology because it's too niche, so we need to create a different process. Hybrid bonding enables finer pitch and higher assembly yields below 12 microns, but it is only available in specific fabs, and there are limited process pairs in fabs, such as 5nm to 5nm wafers, but different foundries cannot be used, and this is also true after two years."
There are also challenges in the process steps.
"For hybrid bonding, the fab stops at one of the last copper layers, which is not easy to detect, but that makes shipping to another fab difficult."
"We want to develop a new technology to standardize the surface treatment of wafers through a common top layer, and use this layer as a standard interface for wafer stacking, so that different wafers can be manufactured in different ways, but the last set of interfaces is common for bonding between different factories." It also means that bonding can be done by a third party, not just a high-volume factory, "he said.
The marks left by the test probe on the copper layer are also a problem for flattening, and these marks must be removed or a non-contact test system used.
But he says it has significant advantages.
"We can transmit power through optical wafers because the elements are more sparse, there are many through-silicon holes (TSVS) and very short channels, and the elements are located in a single layer by using multiple wavelengths." This makes it possible to transmit power from the top and remove cooling from the bottom in the same system."
"In our case, the network on the compute wafer is based on a configurable structure that is set up before the workload is run on the wafer. When you do this with circuit switching in the optical domain, you can evolve electrical switching into the optical domain, but you don't need to do it very often.
Cross the moat of Nvidia
How wide is Nvidia's moat? That's the $3 trillion question on investors' minds today. At least part of the answer may come later this year in the form of an IPO. Cerebras Systems, an AI startup that is trying to challenge Nvidia on the AI chip battlefield, is set for an initial public offering by the end of 2024.
Lior Susan, founder and managing partner of Eclipse Ventures, first invested in Cerebras in 2015, when the company had five presentation slides and theoretical plans for a new computer architecture. Eight years later, the startup offers special large chips with lots of memory for generative AI workloads like model training and reasoning. These are up against Nvidia chips, including the B100 and H100.
The most "annoying" thing about competing with Nvidia is CUDA but according to Susan,
CUDA is a soft