T. Windbacher, A. Makarov, V. Sverdlov and S. Selberherr
Institute for Microelectronics, TU Wien, 1040 Vienna, Austria
After many decades of stunning progress in the shrinking of complementary metal-oxide-semiconductor (CMOS) devices, the steadily increasing difficulty in handling physical limitations as well as the rapidly increasing production and investment costs for each new technology generation will stop CMOS scaling in the not-too-distant future. Among the most challenging problems for further performance gains today are the static power dissipation as well as the interconnection delay and the associated energy for information transport.1 2 A very efficient solution to the static leakage power problem is to simply turn off unused parts of a circuit. However, this causes the previously stored information to vanish and requires energy- and time-wasting recovery cycles, when the dormant circuit parts are powered up. Thus, in order to avoid information loss during shutdown, nonvolatile elements must be incorporated.
Due to its CMOS compatibility, nonvolatility, high endurance, and fast operation, spintronics is a promising avenue for adding nonvolatility to circuits.3 The term spintronics is very general and covers a vast number of devices with an extreme variety in operating principles and practical feasibility for commercial applications.3 4 In this chapter, we concentrate on what, in our opinion, appears to be the most feasible technology for large-scale integration in the next few years: the combination of CMOS with nonvolatile magnetoresistive random-access memory (MRAM). Indeed, the integration of CMOS and magnetic tunnel junctions (MTJs) is not only likely but already available in the form of nonvolatile stand-alone MRAM arrays and embedded DRAM,5 and the introduction of further commercial products will surely follow.6–9
Importantly, the all-electrical magnetization manipulation in modern MRAM by spin transfer torque (STT) renders the wires for separate magnetic field generation superfluous and also significantly reduces the MTJ switching energy. Technological advances, such as the exploitation of free-layer perpendicular magnetic anisotropy and use of MgO tunnel barriers, have led to a further reduction in switching energy, as well as improved scalability.10 Promising spintronic solutions with respect to speed and power consumption have been able to compete with CMOS-only solutions.9 11
However, the usual CMOS/MTJ hybrid structures use MTJs only for storage, while the actual computation is still carried out via CMOS logic. The addition of MTJs also increases circuit complexity and footprint, as extra transistors are needed to read and write the MTJs to access the stored memory data.
Shifting the actual computation into the magnetic domain would help not only to simplify the layout but also increase the integration density. We propose a universal nonvolatile processing environment, schematically illustrated in Fig. 1, consisting of spin torque majority gates (STMGs)12 and nonvolatile magnetic flip-flops,13 illustrated in Figs 2 and 3, respectively. By arranging the STMGs and flip-flops in an array, the flip-flops can be exploited as shared buffers available for neighboring STMGs. In order to elaborate the concept further, a possible realization of an easily extendable 1-bit full adder with the aid of just a single STMG and three nonvolatile flip-flops is explained in the following section.
To fully understand the operation of the nonvolatile processing environment, one has to address first the operation principle of the devices as well as how the information is transferred between them. As illustrated in Figs 2 and 3, the devices are basically spin valves (nonmagnetic metal interconnection layers) or MTJs (oxide interconnection layers) with a common free layer. Thus, these devices are operated via current pulses and the polarities of the applied pulse represent the two logic states “0” and “1,” respectively. The flip-flop has two inputs A and B. Four input combinations are possible. Depending on the applied pulse polarity in relation to the free layer orientation, the acting torque either drives growing precessions, which may lead to a change in the magnetization orientation, or it damps the magnetization movement and tries to keep it in its current position. In the case of two synchronous input pulses with the same polarity, two spin-polarized currents enter the free layer and cause two STTs acting in the same fashion. They add up and cause either the set or the reset of the common free layer orientation. On the other hand, if two synchronous input pulses with opposing polarities are applied, the two torques compensate each other and the initial magnetization orientation is preserved (hold operation). The STMG has the same operation principle, but instead of two input pulses there are always three synchronous pulses. Thus, for any input combination there is always at least one uncompensated STT contribution that prevails and determines the final state of the common free layer. This behavior replicates the majority function12 and can facilitate combinational logic. One has to note that the majority function requires additionally the logic negation to form a computationally complete basis. The simplest solution for this is to assume that the not operation is carried out by inverting the polarity of the corresponding input pulse.
Finally, it is necessary to elucidate how information is transferred between these devices. As already pointed out above, the key to the operation is the orientation and interaction of the applied torques. Therefore, by applying a current pulse to one of the overlapping regions between neighboring devices, the electrons entering the first common free layer are polarized along the first layer's magnetization orientation – see Fig. 4. When they cross over into the second free layer, they relax to the magnetization orientation in the second free layer and by that create an STT encoded with the information stored in the first free layer. Since it takes much longer for a free layer to switch with one active input compared to the case when all inputs are active, there is a safe time window for copying information from a free layer without switching its magnetization.
The assumed 1-bit full adder exhibits three inputs A, B, and CIN and two outputs Sum and COUT. Furthermore, the logic function for the sum is given by14
and the carry-out bit COUT is defined as
Due to the computational completeness of the proposed processing environment, any logic function can be transformed into a sequence of well-defined majority and not operations. For instance, in a first step one calculates majority[A, B, CIN], with the resulting carry-out bit COUT subsequently copied into one of the adjacent flip-flops (e.g., FF1). In the next step, majority[A, B, not(CIN)] is performed and stored in another flip-flop (FF2). Finally, the sum bit is obtained by executing majority[not(FF1), FF2, CIN] and moving the result into FF3. Thus, COUT and Sum are stored and accessible via FF1 and FF3, respectively. Since we chose to perform the calculation of COUT in the very first step, it can be already used by FF1's neighboring STMG as CIN even before the calculation of sum is finished. In this way, the calculation is parallelizable and the exploitation of the flip-flops as shared local buffers reduces significantly information transport over the global bus.
A further essential building block in modern electronics is the oscillator.14 Unfortunately, spin-torque oscillators often require an external magnetic field or operate only at relatively low frequencies, which limits their practical implementation. Previously, we demonstrated that the nonvolatile magnetic flip-flop device intrinsically provides a spin-torque oscillator.15 It operates without an external magnetic field at high frequencies and complements perfectly the proposed nonvolatile processing environment, thereby boosting the achievable integration density. In direct comparison to a CMOS ring oscillator (see Fig. 5) at an assumed half pitch of 15 nm, our proposed structure is approximately 30 times smaller.3 15
In order to boost the output power as well as the operation frequency, a structure comprising two three-layer MTJ stacks with a shared free layer was proposed – see Fig. 6.16 The oscillation frequency of the structure can be tuned by varying the amplitude of the applied current density of one of the MTJs, while the other one is kept fixed, as shown in Fig. 7. In this way, frequencies of up to 30 GHz can be excited.17
Furthermore, the oscillation spectrum contains a primary mode at frequency f and a secondary mode at frequency f/2. By increasing the applied current density jB through MTJB, the amplitude of the secondary mode starts to increase and eventually reaches more than half the maximum amplitude of the primary mode, whereas the output power of the primary mode decreases. There is also a significant dependence of the excited modes on the geometry of the free layer, shown in Fig. 8. Structures with free layer lengths ranging from 40 to 60 nm and current density combinations between 107 and 2.05 × 108 A/cm2 have been investigated.
Increasing the length of the free layer and thereby the distance between the MgO-MTJs shifts the region where the additional large-amplitude oscillation mode appears in the direction of larger jA/jB ratios. Thus, the largest variation in the current density at which the additional mode does not reach large amplitudes is observed in the structure with a 40 nm free layer length.
The resulting nonvolatile processing environment features a highly regular structure, is computationally complete, and reduces the information transport due to its shared buffers. Thus, it is viable for a universal post-CMOS logic technology. The flip-flop is very versatile. The same device can be used stand alone or stacked for even higher integration density. It also offers the possibility of being used as a bias-field-free oscillator while preserving an extremely small footprint. For these reasons, we believe the presented processing environment and its components are very promising candidates for pushing the achievable integration density beyond the state-of-the-art CMOS-MTJ hybrids, while keeping the dissipated power and interconnection delay in check.
This research is supported by the European Research Council through the Grant #247056 MOSILSPIN.
3.15.237.89