Emerging memory technologies like ReRAM and PCM are redefining AI hardware by overcoming the memory bottleneck that limits neural network performance. This article explores why traditional DRAM and NAND struggle with AI workloads, how in-memory computing works, and the practical roles of ReRAM and PCM in modern AI architectures. Discover the strengths, limitations, and real-world applications of these next-generation memory types in hybrid AI systems.
The rapid growth of artificial intelligence has transformed the demands placed on computer hardware faster than any previous technological wave. While processor speed once defined performance, today the spotlight is on memory: its latency, bandwidth, and energy consumption. Memory has become a primary bottleneck limiting the scalability of neural networks and the efficiency of AI chips. As a result, emerging memory types like ReRAM and PCM are capturing attention as alternatives or supplements to traditional DRAM and NAND-especially for AI and in-memory computing. These technologies promise lower energy loss, reduced latency, and a closer integration of data storage with computation.
Modern AI systems are increasingly constrained not by compute units, but by memory. GPUs and specialized AI accelerators can handle massive parallel operations, but their effectiveness hinges on how quickly they can access data. Here, conventional architectures begin to falter.
The crux of the issue lies in the gap between computation and memory. DRAM is physically separated from compute arrays, while NAND serves as even more distant storage. For neural networks, this means weights and activations are constantly shuttled over data buses. Even with high bandwidth, the latency and energy costs of data movement can rival or exceed the cost of computation itself.
This effect is acute in AI tasks. Training and inference involve massive matrix operations-thousands of multiplications and additions requiring continuous memory access. Paradoxically, compute units often idle waiting for data, and system energy consumption rises not because of computation, but due to moving bits around.
This so-called memory wall presents limits that scaling alone cannot solve. Boosting DRAM frequency yields diminishing returns, and adding memory channels drastically complicates and increases chip costs. Consequently, the focus has shifted-from accelerating computation to rethinking the role of memory in AI hardware architectures.
In this context, new approaches are emerging where memory is no longer just a passive storage element but participates directly in computation. The concept of in-memory computing is a logical response to the fundamental constraints of classical models.
In-memory computing is an architectural paradigm where operations are performed directly where data resides, eliminating the constant shuttling of information to and from separate compute blocks. For AI, this means abandoning the classic "memory → processor → memory" scheme, which is responsible for most delays and excess energy use.
This approach is especially logical for neural networks, where most computations boil down to multiplying weight matrices by input data and accumulating results. If weights are already stored in memory cells, the physical process of reading can itself be used for computation-such as summing currents or changing resistances. Here, data is not moved; the result is generated right inside the memory array.
The key benefit of in-memory computing is dramatic energy savings. Transferring data between memory and compute blocks consumes much more energy than the arithmetic itself. For AI systems-especially in data centers and edge devices-this is critical: the constraint is no longer performance, but heat and energy budget.
This approach also scales better. Instead of ramping up compute core frequency or complexity, one simply expands memory arrays, with each cell participating in computation. This paves the way for neuromorphic and matrix architectures that most closely mimic biological neural networks.
However, in-memory computing is not feasible on traditional DRAM and NAND without major compromises. These memory types are not built for analog operations or massive parallelism at the cell level. That's why researchers and manufacturers are now focusing on alternative memories like ReRAM and PCM, which are physically better suited to AI workloads.
ReRAM (Resistive RAM) stores data using material resistance rather than charge (as in DRAM) or floating gates (as in NAND). A ReRAM cell switches between high and low resistance states via short electrical pulses. This physical simplicity makes ReRAM especially attractive for AI.
ReRAM's main advantage is its natural compatibility with in-memory computing. By adjusting the resistance of each cell, neural network weights can be directly encoded within the memory array. When voltage is applied, currents sum according to physical laws, and the matrix-vector multiplication essentially "happens by itself." This is not digital arithmetic but analog computation performed in parallel across thousands of cells.
From an energy-efficiency perspective, ReRAM outperforms DRAM by orders of magnitude. Energy is spent on local interactions, not data movement. For AI accelerators, this means inference can be executed with minimal heat generation-crucial for edge devices, autonomous sensors, and mobile systems.
Another advantage is high storage density. ReRAM cells can be scaled to very small sizes and are compatible with three-dimensional integration. This makes it a potential replacement for portions of DRAM or even NAND in specialized AI chips where efficiency for specific operations matters more than versatility.
ReRAM is no longer just a lab technology. Experimental prototypes are used in neuromorphic accelerators and matrix coprocessors, where handling neural network weights is paramount. Still, there are limitations: resistance instability, parameter variability, and challenges in precise analog state control.
These issues don't make ReRAM unviable, but they do limit its practical scope. It's not ready to replace DRAM in general-purpose computing, but in specialized AI architectures its advantages can be decisive.
PCM (Phase-Change Memory) operates by changing the phase state of a material-typically a chalcogenide. In its amorphous state, the material has high resistance; in the crystalline state, low resistance. Switching between states is achieved using precise thermal pulses, making PCM fundamentally different from both DRAM and ReRAM at the physical level.
PCM's properties place it between volatile and non-volatile memory. It's faster than NAND, non-volatile, and can store data more densely than DRAM. This combination made PCM an early candidate for "universal memory"-bridging the gap between speed and capacity.
For AI, PCM is attractive primarily for its state stability. Unlike ReRAM, where resistance can drift, PCM's phase states are more predictable and easier to control-a crucial factor when storing neural network weights that require high precision and repeatability.
PCM is also suitable for partially analog computation. By varying the degree of material crystallization, not only binary but intermediate values can be encoded-enabling storage of low-precision weights. For AI inference, this offers a tradeoff between accuracy and energy efficiency.
However, PCM has drawbacks. Phase switching requires heating, which increases energy consumption and limits write speed. Its rewrite endurance is lower than DRAM, and thermal effects complicate array scaling. As a result, PCM hasn't replaced DRAM as mass-market working memory.
In practice, PCM has found its niche in specialized accelerators and storage systems focused on AI workloads, where non-volatility and predictability matter more than maximum write speed. It's not a universal solution, but a tool for specific architectural needs.
Directly comparing new memory types to DRAM and NAND is misguided. ReRAM and PCM aren't trying to replace the entire memory hierarchy; their value is in specific, AI-related scenarios. Here, the differences between technologies become practical rather than academic.
DRAM remains the fastest memory for random access, but it's volatile, doesn't scale well in density, and requires constant data exchange with compute blocks. For neural networks, this translates to high latency and massive energy loss every time weights are moved. NAND, on the other hand, is too slow and designed for storage, not computation-making it unsuitable for active model execution.
ReRAM and PCM win not on single-operation speed, but on architecture. They can reduce or eliminate data movement-the main bottleneck in AI systems. In inference tasks, this leads to real gains: lower energy, less heat, and higher computational density per chip area.
ReRAM's strength lies in analog computation and massive parallelism. It's not suited for general-purpose tasks but excels as a matrix accelerator with thousands of cells participating in computation simultaneously. In these scenarios, ReRAM doesn't compete with DRAM; it effectively replaces compute blocks, turning memory into a processor.
PCM, by contrast, offers a compromise. It can serve as non-volatile memory for storing weights between model runs or as a slower, denser alternative to DRAM in AI accelerators. In real systems, it's used where predictability and stability matter-even at the expense of write speed.
In practice, neither ReRAM nor PCM displaces DRAM or NAND entirely. Instead, we see hybrid architectures: DRAM handles control logic, NAND provides storage, and new memory types take on weight storage and in-memory computation. In this role, they are proving genuinely useful.
Despite active research, ReRAM and PCM are still rare in mass-market consumer devices. Their real applications are in specialized AI hardware-architectures designed for specific models and workloads, not for all-purpose use.
Today, ReRAM is most often found in experimental and pre-production AI accelerators focused on in-memory computing. These chips are used for neural network inference with fixed weights-for example, in computer vision, signal recognition, and edge devices. Here, energy efficiency and minimal latency are more important than absolute computational accuracy. ReRAM's analog nature is a good fit for tasks where small errors are acceptable.
ReRAM sees notable use in neuromorphic architectures, where memory arrays directly correspond to synapses, and computation happens via the material's physical properties. This enables compact, low-power AI modules that can operate autonomously without active cooling-crucial for embedded and edge systems.
PCM, in contrast, has seen some industrial adoption. It has been used in specialized solutions as non-volatile memory with lower latency than NAND. In AI, PCM stores model weights and intermediate data where state retention and fast startup (without slow reload from storage) are important.
In servers and research systems, PCM is being tested as part of a multi-level memory stack-between DRAM and NAND. This allows large models to be kept closer to compute blocks, reducing data access time during inference-especially relevant for massive neural networks with weights measuring in tens or hundreds of gigabytes.
It is important to note that in real-world products, ReRAM and PCM almost always work alongside traditional memory. They don't replace DRAM or NAND but complement them, taking on the most challenging AI tasks-weight storage and repeated computation. This hybrid model is currently seen as the most viable approach.
Despite their appeal, ReRAM and PCM are far from universal solutions. Their main challenges aren't with the operating principle, but with materials science and manufacturing complexity-factors that currently hinder mass adoption in AI hardware.
For ReRAM, the key issue is cell variability. Even within a single array, resistance can differ significantly and drift over time due to temperature and repeated write cycles. For analog computation, this means error accumulation and the need for complex calibration. The larger the array, the harder it is to ensure stability and reproducibility.
PCM faces different constraints. Phase switching requires heating, increasing energy consumption and complicating chip thermal management. PCM's write endurance is lower than DRAM, making it less suitable for intensive training workloads. Density scaling is limited by thermal cross-effects between adjacent cells.
Cost is another challenge. Fabricating ReRAM and PCM involves new materials, extra process steps, and stringent control-making such chips more expensive than traditional memory at lower volumes. Until these technologies reach mass production, their use is economically justified only in niche AI accelerators.
Software is also a factor. In-memory computing demands new programming models, compilers, and algorithms adapted to the approximate and analog nature of computation. Without a supporting software ecosystem, even the most effective memory remains an experiment, not a product.
As a result, ReRAM and PCM currently occupy a space of technological compromise. They have proven their usefulness for AI, but scaling is limited by physics, cost, and integration complexity. That's why the industry is moving not toward replacing all memory, but toward carefully embedding these technologies into hybrid architectures.
Emerging memory types like ReRAM and PCM have arisen not as the next step in DRAM or NAND evolution, but as a response to the fundamental limits of AI hardware. In neural network workloads, the bottleneck has shifted from computation to data movement-where traditional architectures no longer scale.
ReRAM has proven itself an effective tool for in-memory computing and neuromorphic architectures, enabling radical energy reductions and bringing weight storage closer to computation. However, it requires complex calibration and isn't suited for general-purpose tasks. PCM has carved out a niche as stable non-volatile memory between DRAM and NAND, useful for weight storage and speeding up inference in specialized AI systems.
Experience shows that neither technology will replace traditional memory entirely. The true future lies in hybrid architectures where DRAM, NAND, ReRAM, and PCM coexist-each solving a specific challenge. This approach delivers maximum efficiency without trying to force a single technology to be universal.
For AI hardware, this marks a paradigm shift: memory is no longer a passive component but an active part of computation. In this role, new memory types are already finding their place-not in marketing hype, but in real architectural solutions.