
Image: Mininyx Doodle via Getty Images
5 min. read
Increasingly, neural processing units (NPUs) are making their way into consumer electronics: laptops, high-end tablets, phones, and more. But what do they do, and why are they suddenly showing up?
Answering this question is Benjamin C. Lee, a professor in the departments of Electrical and Systems Engineering and Computer and Information Science at the School of Engineering and Applied Science.
Lee began his career as a computer architect who, he says, “thinks a lot about processors and hardware systems.” He explains that while general purpose central processing units or CPUs are the bread and butter for processor designers, a smaller cohort works on NPUs—one that is poised to grow in the years ahead.
“NPUs, that’s where the frontier is and the number of design teams there is much, much smaller,” he says.
In his research at Penn, Lee thinks through sustainability solutions for hardware technologies: processors, data centers, batteries, renewable energy, and amortization, which refers to extending the useful life of computer chips. That work is ongoing and has included collaboration with the private sector, including Meta and Google.
Here, Lee defines an NPU, its function in the age of artificial intelligence (AI), and opportunities to reduce their carbon footprint.
A neural processing unit is a piece of hardware, a chip, that’s customized to do particularly well on the matrix arithmetic that AI relies on. It is intended to support inference, which means responding to a request to a trained model. Suppose you have downloaded a trained model onto your device and now you want to ask it questions or issue prompts. The model is probably small because it’s sitting within your personal device and it’s going to be able to perform tasks locally without going to some remote data center across the internet.
That’s right. When you look at something like Apple Intelligence, for example, it actually combines a mix of capabilities. Some of those capabilities and tasks use smaller models, which can be processed on the device using the NPU and other local chips. But Apple Intelligence also have a fallback plan. Complicated tasks that require larger models will be sent to the data center for processing. So, the solution offers hybrid software models that use different types of hardware systems.
Right. It really depends on whether you decide to use the particular model that has been sized and tuned to run on the NPU.
No, but I think it would end up being a lot slower and potentially less energy-efficient, which is particularly problematic for personal electronics that run on batteries. NPUs are preferred because they are a lot more energy-efficient for that class of computation. The way to think about CPUs, generally, is that it performs arithmetic on a single input, a scalar value. For example, consider A + B = C, where A is a single value, B is a single value, and so on. Each value is fed into the CPU for individual calculations, which is broadly useful but inefficient when many calculations are required.
GPUs [graphics processing units] and NPUs are more efficient because they don’t deal with individual scalar values; they deal with vectors of values, or matrices of values, and deal with them all in one go. Their hardware structures can consume a matrix of values, and that’s where most of the speed and energy efficiency comes from. They compute an answer with much less data movement, much less communication, and therefore much less energy.
I think it’s primarily AI. There are additional broader, benefits for traditional audio or image processing but it is the excitement around AI that has sparked interest in hardware design and deployment in consumer electronics. The N is for neural processing.
Yes. I don’t know the marginal cost and how much more expensive a chip becomes. But essentially the NPU either requires fabricating another chip or designing a larger chip that includes those additional hardware structures. The NPU may also require separate or additional memory to hold its data.
That’s something we’ve been thinking a lot more about recently, especially in consumer electronics because we know that there is an environmental cost or a carbon footprint associated with semiconductor manufacturing. When you think about semiconductor fabs, the factories that manufacture chips, roughly half of their carbon footprint is associated with electricity generated to run those fabs. These fabs are often located in places, such as Taiwan or Korea, that do not have a lot of renewable or carbon-free energy. The other half of that carbon footprint comes from gases that have global warming potential. These gases would need to be mitigated or abated in some way.
Even if fabs were to use only carbon-free energy, you might have a good 40-50% of carbon-equivalent emissions remaining from the chemistry and processes of [fabricating] the chips. Given this perspective, improving hardware sustainability requires reducing, reusing, or extending the lifetime of chips that are being shipped. This problem is most pressing for consumer electronics. We’re a bit less worried about semiconductor manufacturing for data centers because when, for example, Google buys a high-performing CPU or a high-performance GPU, they are running it full-throttle at 100%. They’re getting really good use out of that chip, and the fancy way of saying it is they’re amortizing the cost of manufacturing the hardware over lots of useful work and over a very long lifetime, perhaps eight to 10 years.
It’s like a rental car. Consider the carbon costs of manufacturing the car. Compared to a personal vehicle, a rental will accumulate useful miles quickly and better amortize the manufacturing costs. There’s some upside in the heavy utilization of datacenter hardware. But you’re not getting that same utilization in consumer electronics. We’re still refreshing our phones maybe every two, three years and the increasingly capable chips inside the phone are often idle because their owners do not require continuous compute and the batteries wouldn’t permit it anyway.
Exactly. So that’s probably what some of these companies are hoping to market. Essentially, the sustainable impact is about the carbon footprint of the semiconductor manufacturing.
To first order, the carbon footprint increases proportionally with the size of the chip, so if you double the size of the chip, the carbon footprint associated with it doubles. This goes to the question of how big that NPU is compared to the original CPU. As the chip gets bigger, costs will increase. When we talk about the NPU and semiconductor manufacturing, the NPU chip itself has a manufacturing footprint but the DRAMs, the memories, are also chips that must be fabricated.
I think there are two aspects: One is I think, everything else being equal, a larger NPU can get things done more quickly. For example, larger prompts. If you’re doing a chatbot, larger prompts can be consumed and understood more quickly, and then that’s where the benefit of a larger, more capable NPU comes from. Second, as models get bigger, NPUs will need to be supported with larger memories. I expect continuing advances both in NPUs for computation and in memory systems for the models.
Image: Mininyx Doodle via Getty Images
nocred
Image: Pencho Chukov via Getty Images
Charles Kane, Christopher H. Browne Distinguished Professor of Physics at Penn’s School of Arts & Sciences.
(Image: Brooke Sietinsons)