What is an NPU? A Penn expert explains

Benjamin C. Lee, a professor of electrical and systems engineering, explains what a neural processing unit (NPU) is and why it matters in the age of artificial intelligence.

Increasingly, neural processing units (NPUs) are making their way into consumer electronics: laptops, high-end tablets, phones, and more. But what do they do, and why are they suddenly showing up?

Answering this question is Benjamin C. Lee, a professor in the departments of Electrical and Systems Engineering and Computer and Information Science at the School of Engineering and Applied Science.
Lee began his career as a computer architect who, he says, “thinks a lot about processors and hardware systems.” He explains that while general purpose central processing units or CPUs are the bread and butter for processor designers, a smaller cohort works on NPUs—one that is poised to grow in the years ahead.

“NPUs, that’s where the frontier is and the number of design teams there is much, much smaller,” he says.

In his research at Penn, Lee thinks through sustainability solutions for hardware technologies: processors, data centers, batteries, renewable energy, and amortization, which refers to extending the useful life of computer chips. That work is ongoing and has included collaboration with the private sector, including Meta and Google.

Here, Lee defines an NPU, its function in the age of artificial intelligence (AI), and opportunities to reduce their carbon footprint.

What is an NPU?

A neural processing unit is a piece of hardware, a chip, that’s customized to do particularly well on the matrix arithmetic that AI relies on. It is intended to support inference, which means responding to a request to a trained model. Suppose you have downloaded a trained model onto your device and now you want to ask it questions or issue prompts. The model is probably small because it’s sitting within your personal device and it’s going to be able to perform tasks locally without going to some remote data center across the internet.

And that’s the key distinction, right, that it’s not like ChatGPT where you’re typing a question, a prompt, and it’s being processed elsewhere?

That’s right. When you look at something like Apple Intelligence, for example, it actually combines a mix of capabilities. Some of those capabilities and tasks use smaller models, which can be processed on the device using the NPU and other local chips. But Apple Intelligence also have a fallback plan. Complicated tasks that require larger models will be sent to the data center for processing. So, the solution offers hybrid software models that use different types of hardware systems.

So those devices have an NPU, but don’t always use it?

Right. It really depends on whether you decide to use the particular model that has been sized and tuned to run on the NPU.

Does a device need an NPU to take advantage of AI locally?

No, but I think it would end up being a lot slower and potentially less energy-efficient, which is particularly problematic for personal electronics that run on batteries. NPUs are preferred because they are a lot more energy-efficient for that class of computation. The way to think about CPUs, generally, is that it performs arithmetic on a single input, a scalar value. For example, consider A + B = C, where A is a single value, B is a single value, and so on. Each value is fed into the CPU for individual calculations, which is broadly useful but inefficient when many calculations are required.

GPUs [graphics processing units] and NPUs are more efficient because they don’t deal with individual scalar values; they deal with vectors of values, or matrices of values, and deal with them all in one go. Their hardware structures can consume a matrix of values, and that’s where most of the speed and energy efficiency comes from. They compute an answer with much less data movement, much less communication, and therefore much less energy.

Is NPU’s appeal AI or does it manage other tasks as well?

I think it’s primarily AI. There are additional broader, benefits for traditional audio or image processing but it is the excitement around AI that has sparked interest in hardware design and deployment in consumer electronics. The N is for neural processing.

Is it expensive to start adding NPUs to devices?

Yes. I don’t know the marginal cost and how much more expensive a chip becomes. But essentially the NPU either requires fabricating another chip or designing a larger chip that includes those additional hardware structures. The NPU may also require separate or additional memory to hold its data.

Is there an implication for sustainability and resource use if you’re building new or larger chips?

That’s something we’ve been thinking a lot more about recently, especially in consumer electronics because we know that there is an environmental cost or a carbon footprint associated with semiconductor manufacturing. When you think about semiconductor fabs, the factories that manufacture chips, roughly half of their carbon footprint is associated with electricity generated to run those fabs. These fabs are often located in places, such as Taiwan or Korea, that do not have a lot of renewable or carbon-free energy. The other half of that carbon footprint comes from gases that have global warming potential. These gases would need to be mitigated or abated in some way.

Even if fabs were to use only carbon-free energy, you might have a good 40-50% of carbon-equivalent emissions remaining from the chemistry and processes of [fabricating] the chips. Given this perspective, improving hardware sustainability requires reducing, reusing, or extending the lifetime of chips that are being shipped. This problem is most pressing for consumer electronics. We’re a bit less worried about semiconductor manufacturing for data centers because when, for example, Google buys a high-performing CPU or a high-performance GPU, they are running it full-throttle at 100%. They’re getting really good use out of that chip, and the fancy way of saying it is they’re amortizing the cost of manufacturing the hardware over lots of useful work and over a very long lifetime, perhaps eight to 10 years.

So, there is a sustainability upside to a data center?

It’s like a rental car. Consider the carbon costs of manufacturing the car. Compared to a personal vehicle, a rental will accumulate useful miles quickly and better amortize the manufacturing costs. There’s some upside in the heavy utilization of datacenter hardware. But you’re not getting that same utilization in consumer electronics. We’re still refreshing our phones maybe every two, three years and the increasingly capable chips inside the phone are often idle because their owners do not require continuous compute and the batteries wouldn’t permit it anyway.

Which is why I’m wondering if all my devices will be outdated in three years if they’re not able to locally deal with AI?

Exactly. So that’s probably what some of these companies are hoping to market. Essentially, the sustainable impact is about the carbon footprint of the semiconductor manufacturing.

To first order, the carbon footprint increases proportionally with the size of the chip, so if you double the size of the chip, the carbon footprint associated with it doubles. This goes to the question of how big that NPU is compared to the original CPU. As the chip gets bigger, costs will increase. When we talk about the NPU and semiconductor manufacturing, the NPU chip itself has a manufacturing footprint but the DRAMs, the memories, are also chips that must be fabricated.

Are these NPUs expected to improve in the next few years? Is there more intense interest in working on that now that it’s become more widely known or manufactured?

I think there are two aspects: One is I think, everything else being equal, a larger NPU can get things done more quickly. For example, larger prompts. If you’re doing a chatbot, larger prompts can be consumed and understood more quickly, and then that’s where the benefit of a larger, more capable NPU comes from. Second, as models get bigger, NPUs will need to be supported with larger memories. I expect continuing advances both in NPUs for computation and in memory systems for the models.

Credits

Deepfakes, digital doubles, and the law: Jennifer Rothman on protecting identity in the AI era

The evolution of AI technology may require more robust copyright laws and guardrails that protect people’s control of their own identities, says the Penn Carey Law professor.

Houston Hall full of posters and students and visitors at the CURF Poster Expo

Campus & Community

Bold ideas and innovation on display at the Fall Research Expo

On Sept. 15, hundreds of posters were presented throughout Houston Hall at the annual Fall Research Expo, representing the research projects of 410 undergraduate students conducted through the Center for Undergraduate Research.

Health & Medicine

Exposure to air pollution worsens Alzheimer’s disease

New research from Penn Medicine finds living in areas with high concentration of air pollution is associated with increased buildup of amyloid and tau proteins in the brains of Alzheimer’s patients, accelerating cognitive decline.

Natural Sciences

Penn physicist Charles Kane to receive the 2026 Lorentz Medal

Awarded every four years by the Royal Netherlands Academy of Arts and Sciences, the medal honors Kane’s pioneering research on topological insulators.