### PRECISION OF SUM-OF-PRODUCTS IN ANALOG NEURAL NETWORKS Michel Verleysen, Paul Jespers Université Catholique de Louvain Laboratoire de Microélectronique 3, pl. du Levant, 1348 Louvain-la-Neuve, Belgium ### 1. INTRODUCTION VLSI implementations of analog neural networks have been strongly investigated during the last five years. Except some specific realizations where the precision and the adaption rule are more important than the size of the network [1] [2], most applications of neural networks require large arrays of neurons and synapses. The fan-out of the neuron is not the crucial point: digital or analog neurons can be easily designed so that they can drive a large number of synapse inputs (in the next layer in the case of multi-layered networks, in the same layer in the case of feedback networks). Fan-in is more important: whatever is the transmission mode of information between synapses and neurons (voltage, current, pulses,...) the neuron input must have a large dynamics if it is connected to hundreds of synapses. Digital neurons are of course the solution: if the dynamics of the neuron inputs has to be increased, more bits will be used and the required precision will be obtained. However, digital cells are in general much larger than their analog counterpart: for example, a neuron connected to 100 synapses must contain a digital adder with 100 inputs, each of them coded in several bits. The silicium area occupied by the cells and the connections between cells will be incompatible with the integration of a large number of synapses and neurons on a single chip. #### 2. ANALOG NEURONS In order to compensate for such lack of efficiency, analog cells are used in VLSI neural netwroks. One easy way to transmit infromation between synapses and neurons is to use currents; all the excitatory and inhibitory currents coming from the synapses connected to the same neuron will then be summed on one or two lines, and the problem of the connections no more exists. In order to avoid differences in the excitatory and inhibitory currents, and to suppress the possible mismatching between N-type and P-type current sources, we choose an architecture where excitatory and inhibitory currents are generated by the same type of transistors [3] [4]. Two lines are necessary: one to sum the excitatory currents, one to sum the inhibitory ones. The two total currents, with identical signs, are to be compared in the neuron. This can be done by a load or a current mirror, associated with a voltage comparator, as illustrated in fig. 1. Figure 1: two-line system # 2.1 two-transistor load In order to convert the two currents into voltages in a two-transistor load, two simple solutions can be considered: two loads (fig. 2.a) or a current mirror (fig. 2.b). Figure 2.a: two loads Figure 2.b: current mirror When many current sources are connected to the neuron, the load must be able to discriminate small currents (i.e. one synaptic current) between the two lines, whatever is the common-mode current in these lines (for example, the neuron must have the same behaviour if one excitatory and two inhibitory currents are connected, or with 500 excitatory and 501 inhibitory ones). The ability to discriminate currents in the neuron will of course be enhanced with the differential gain of the load (differential gain is here defined as the voltage difference between the two lines for a given current difference at the input of the load). This gain can easily be computed, assuming the transistors are in saturated mode, and neglecting second-order effects; current can then be expressed by: $$I = \mu C_{ox} \frac{W}{L} \frac{(V_{gs} - V_t)^2}{2}$$ For the option with the two loads (fig. 2.a) we have: $$\Delta I = \mu C_{ox} \frac{W}{L} \left( \frac{(V_{gs1} - V_t)^2}{2} - \frac{(V_{gs2} - V_t)^2}{2} \right)$$ $$= \mu C_{ox} \frac{W}{L} \left( \frac{V_{gs1}^2 - V_{gs2}^2}{2} + V_t (V_{gs2} - V_{gs1}) \right)$$ $$= \mu C_{ox} \frac{W}{L} (V_{gs1} - V_{gs2}) \left( \frac{V_{gs1} + V_{gs2}}{2} - V_{t} \right)$$ $$G = \frac{\Delta V}{\Delta I} = \frac{1}{\mu C_{ox} \frac{W}{L}} \frac{1}{\frac{V_{gs1} - V_t}{2} + \frac{V_{gs2} - V_t}{2}}$$ (1) For the option with the current mirror (fig. 2.b), without the Early effect, the same current would flow into the two transistors. The voltage shift will thus be determined only by the Early effect: $$\Delta V = \Delta I \frac{1}{\frac{I_1}{V_{EAp}} + \frac{I_2}{V_{EAp}}}$$ where $V_{EAp}$ is the Early voltage of P-type transistors, and $V_{EAn}$ the Early voltage of the current sources driving $I_2$ . Gain is thus given by: $$G = \frac{\Delta V}{\Delta I} = \frac{1}{\frac{I_1}{V_{EAp}} + \frac{I_2}{V_{EAn}}} = \frac{1}{\mu C_{ox} \frac{W}{L}} \frac{1}{\frac{(V_{gs1} - V_t)^2}{2 V_{EAp}} + \frac{(V_{gs2} - V_t)^2}{2 V_{EAn}}}$$ (2) Comparing (1) and (2), and assuming $$\frac{V_{gs} - V_t}{V_{EA}} << 1 \quad ,$$ the gain in fig. 2.b is clearly larger than the gain in fig. 2.a. Once the gain has been computed, the presision of the mirror must be examined. Due to the oxide gradient or other technological imperfections, the threshold voltages of two transistors are never exactly the same; $\beta$ factors ( $\beta = \mu C_{ox}$ W/L) can also differ. The impact of the differences in the threshold voltages can be expressed by: $$I_2 = \frac{\beta}{2} (V_{gs1} - V_t + \Delta V_{tm})^2$$ where $V_{tm}$ is the threshold voltage of the transistors in the current mirror and $\Delta V_{tm}$ the possible difference between the $V_t$ of the two transistors. But $$V_{gs1} = V_t + \sqrt{\frac{2 I_1}{\mu C_{ox} \frac{W}{L}}}$$ thus $$I_2 = \mu C_{ox} \frac{W}{L} \frac{1}{2} \left( \sqrt{\frac{2 I_1}{\mu C_{ox} \frac{W}{L}}} + \Delta V_{tm} \right)^2$$ $$\cong I_1 + \mu C_{ox} \frac{W}{L} \sqrt{\frac{2 I_1}{\mu C_{ox} \frac{W}{L}}} \Delta V_{tm}$$ (neglecting second-order effects). The error in the current $I_{\mbox{\scriptsize tm}}$ is thus given by: $$\Delta I_{tm} \cong \sqrt{2 \; \mu \; C_{ox} \; \frac{W}{L} \; I_1} \; \; \Delta V_{tm}$$ $$\cong \frac{2}{V_{gs} - V_{t}} \Delta V_{tm} I$$ The effect of $\beta$ variations is expressed by: $$\Delta I_{\beta} = \frac{\Delta \beta}{\beta} I_{1}$$ A third error to consider is the mismatching between the two transistors at the input stage of the differential amplifier which will measure the voltage difference between the $V_{gs}$ of the two transistors in the mirror (fig. 3). This error is given by: $$\Delta I_{td} = \frac{\Delta V_{td}}{V_{EA}} I_1$$ Figure 3: mismatching of the voltage comparator In order to compare these three errors, realistic vaues are chosen: I: from 0 to 500 $\mu\text{A}$ $\mu_{p}\,\text{C}_{\text{ox}}\text{: }1.5\ 10^{-5}\,\text{A/V}^{2}\text{ (standard CMOS process)}$ W/L: must be chosen to cope with the maximum current (500 $\mu$ A). If $V_{qs}$ - $V_t$ can be up to 1.5V, then $$\frac{W}{L} = \frac{I_{max}}{\mu_p C_{ox} \frac{(V_{gs} - V_t)^2}{2}} \cong 30$$ V<sub>EAn</sub>:20V $\Delta V_{tm}$ : 10mV $\Delta\beta/\beta$ : 0.01 $\Delta V_{td}$ : 10 mV (these three last values can be reached with careful design of the mirrors and comparators). The three currents $\Delta I_{tm}$ , $\Delta I_{\beta}$ and $\Delta I_{td}$ are given in fig. 4. The error due to the threshold voltage difference in the current mirror ( $\Delta l_{tm}$ ) is obviously the most important one, especially for small currents. This is due to the fact that this error is proportional to the square root of the size of the transistors in the mirror. Even for small currents, this error is thus important because these two transistors must be large enough to drive the maximum current (here $500 \, \mu A$ ). Figure 4: errors for current mirror One solution to this problem would be to connect several mirrors in parallel, each of them being active only when necessary to drive the total current. This solution is considered in section 2.2. # 2.2 multi-transistor load A load where several mirrors are connected in parallel through switches is now considered (fig. 5). The switches are supposed to be active sequentially, depending on the value of the greatest current among $I_1$ and $I_2$ ; in other words, if this current is $I_M$ , we have: $$0 < I_M \le I_{ref}$$ $\rightarrow 1 load active$ $I_{ref} < I_M \le 2I_{ref}$ $\rightarrow 2 loads active$ where I<sub>ref</sub> is a given current which will be estimated at the end of this paper. In order to focus on the advantages of such solution, the same three errors computed in section 2.1 will be estimated, but this time for a load as described in fig. 5, with two current mirrors. We suppose first that the switches have no influence on these errors, and that $I_{ref} = I_{max}/2$ , where Imax is the maximum current in the load (here 500 $\mu$ A). Since the maximum current $I_{max}$ is supposed to be the same, the size of each current mirror can be reduced to W/L = 15. The values of all other parameters are identical as in section 2.1. Errors $\Delta I_{\beta}$ and $\Delta I_{td}$ do not change; error $\Delta I_{tm}$ , however, depends on the W/L of the transistors in the mirrors. As far as only one load is active, $\Delta I_{tm}$ is given by: $$\Delta I_{tm} = \sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_2 I_1} \Delta V_{tm}$$ , where $\left(\frac{W}{L}\right)_2 = 15$ When the two loads are active, error $\Delta I_{tm}$ is given by: $$\Delta I_{tm} = 2 \int 2 \mu C_{ox} \left(\frac{W}{L}\right)_2 \frac{I_1}{2} \Delta V_{tm}$$ Figure 5: current mirrors in parallel The three errors $\Delta l_{tm}, \Delta l_{\beta}$ and $\Delta l_{td}$ are illustrated in fig. 6. Figure 6: errors for a load with two mirrors Two remarks have to be made. First, the error $\Delta I_{tm}$ when $I_1 = I_{max}$ does not change between the devices from fig. 2.b and fig. 5. If the W/L of the transistors in the mirrors are indeed respectively $(W/L)_1$ and $(W/L)_2$ , we have for the first case $$\Delta I_{tm} = \sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_1 I_{max}} \Delta V_{tm}$$ and for the second one $$\Delta I_{tm} = 2 \sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_2 \frac{I_{max}}{2}} \Delta V_{tm}$$ Since $(W/L)_1 = 2(W/L)_1$ , these errors are identical. However, the error $\Delta I_{tm}$ for $I_1 < I_{ref}$ is smaller in the second case, due to the fact that the transistors are more efficiently used (a greater current flows in the transistors with respect to their size). Secondly, the error $\Delta I_{tm}$ for $I_1 \geq I_{ref}$ is identical in the two situations, the two devices being equivalent if all the switches are on (the switches are still considered to have no influence on the errors). Furthermore, fig. 6 shows a discontinuity in the curve $\Delta I_{tm}$ when $I_1 = I_{ref}$ . The diminution of the error $\Delta I_{tm}$ can thus be improved if this discontinuity is suppressed; a solution to this problem is presented in section 2.3. ## 2.3 multi-transistor load with maximum current The discontinuity in fig. 6 can be suppressed if one of the two loads gets his maximum current $(I_{ref})$ and the other one the remaining current $(I_1 - I_{ref})$ (see fig. 7). Figure 7: loads with maximum current Error $\Delta I_{tm}$ is then given by: $$\Delta I_{tm} = \sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_2 I_1} \Delta V_{tm} \qquad if I_1 \leq I_{ref}$$ $$\Delta I_{tm} = \sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_2 I_{ref}} \Delta V_{tm} + \sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_2 (I_1 - I_{ref})} \Delta V_{tm}$$ The three errors $\Delta I_{tm}$ , $\Delta I_{\beta}$ and $\Delta I_{td}$ are illustrated in fig. 8. # 3. VLSI NEURON The solution of section 2.3 can be used to implement a VLSI neuron for an artificial neural network where the number of synapses connected to a single neuron is important. In order to avoid changes in the current flowing through the synaptic current sources, an operational amplifier in a feedback loop is introduced as shown in fig. 9. In this way, the drain voltage of the if $I_2 \ge I_{ref}$ Figure 8: errors for a load with maximum currents current sources is kept fixed, and the synaptic currents remain identical whatever the total current in the load is. Furthermore, with this feedback loop, the current in the two lines is directly determined by the synapses, and parameter variations in the switches have thus only second-order effect. Figure 9: fixed synaptic current The principle explained in section 2.3 can be expanded to more than two stages. If this is the case, the first stage will always be connected to the sources, while the second one will only be connected when $I_M = \max(I_1, I_2)$ is greater than $I_{ref}$ , the third one when $I_M$ is greater than $2 \cdot I_{ref}$ , and so on. An efficient $I_{ref}$ will be computed in section 4. The switches which connect the successive loads can be driven by a device as illustrated in fig. 10. Figure 10: command for switches The ratio n:1 used in the N-type current mirror depends on the current $n \cdot l_{ref}$ to which $l_M$ must be compared. This device needs the current $l_M$ as input; this could be done by inserting another cell which generates the maximum of the currents $l_1$ and $l_2$ . However, $l_M$ can be replaced by $l_1$ or $l_2$ without loss of performance. It is not very important indeed if the loads are not activated exactly at $l_{ref}$ , $2l_{ref}$ , $3l_{ref}$ , ... but well at an approximation of these values. If the currents $l_1$ and $l_2$ are quite different, the circuit will work properly if, by chance, the current $l_1$ and $l_2$ which is chosen to drive the cell of fig. 10 is greater than the other. If this is not the case, say $l_M = l_1$ and $l_1 \ll l_2$ , only the number of loads necessary to drive properly $l_1$ will be active. The transistors driving $l_2$ will not be sufficient for such a current, and their drain voltage will thus spectacularly decrease. In this case, it will not be difficult to discriminate between $l_1$ and $l_2$ with a simple comparator; the current $l_M$ can thus be replaced for example by $l_1$ . An important point is to avoid to duplicate the current $l_1$ at the output of the synapses; the advantage of the circuit would indeed be lost because of the imperfections in the mirrors used to duplicate the current. A second, less precise, current $l_1$ has thus to be generated directly in the synapses. The complete circuit is shown in fig. 11; all the mirrors have unity ratios, except those indicated in the figure. Figure 11: complete neuron # 4. NUMBER OF STAGES IN THE NEURON The last question to solve is the choice of the current $I_{ref}$ , and so to decide the optimum number of stages in the neuron. First, the number n of stages in the neuron and the current $I_{ref}$ are related by: $$n \cdot I_{ref} \leq I_{ref} < (n+1) \cdot_{ref}$$ , where $I_{max}$ is the maximum current the neuron has to drive. The link between this architecture to compute analog sum-of-products and neural networks can now be restored. The function to realize is the sum of fixed synaptic currents, and the logic comparison between the total excitatory and inhibitory currents. If $I_{syn}$ is one single synaptic current, the error $\Delta I_{tot} = \Delta I_{tm} + \Delta I_{\beta} + \Delta I_{td}$ has no influence on the logic comparison as long as $\Delta I_{tot} < I_{syn}$ . This realtion can be developed: $$\sqrt{2 \mu C_{ox} \frac{W}{L} I} \Delta V_{tm} + \frac{\Delta \beta}{\beta} I + \frac{\Delta V_{td}}{V_{EA}} I < I_{syn}$$ In section 2.2, the fact was proven that the introduction of several mirrors does not change the error when $I = I_{max}$ . It can also be shown that the errors computed at values of the current which switch on a new stage are proportional to this current. The optimum number of stages will thus be: $$n = int \left( \frac{\sqrt{2 \mu C_{ox} \left(\frac{W}{L}\right)_{tot} I_{max}} \Delta V_{tm} + \frac{\Delta \beta}{\beta} I_{max} + \frac{\Delta V_{td}}{V_{EA}} I_{max}}{I_{syn}} + 1 \right)$$ where $(W/L)_{tot}$ is the size of a transistor which could drive the total current $I_{max}$ . The value of $I_{ref}$ is then given by: $$I_{ref} = \frac{I_{max}}{n}$$ This value of $I_{ref}$ corresponds to a maximum error of $I_{syn}$ . It would probably be useful to have a security factor on the allowed error, i.e. to replace $I_{syn}$ by 0.9 $I_{syn}$ . It can easily be verified that if the error with a current I is less than $I_{syn}$ , then the error with a current 2I will be less than $2I_{syn}$ , and so on. This value of $I_{ref}$ is optimum, because the current for which the error is less than $I_{syn}$ is maximum (and also for $2I_{syn}$ ,...). It would be unprofitable to enhance the number of stages, and thus to decrease $I_{ref}$ ; the error current indeed would decrease in absolute value, but not in terms of integer muliples of $I_{syn}$ ; this would thus have no effect on the logic comparison between the total excitatory and inhibitory currents. # 5. CONCLUSION A method is presented to reduce the errors due to mismatching of components in a VLSI neuron used in a neural network where the information is transmitted by currents. Since the number of neurons in a chip is much less than the number of synapses, the loss of area due to this neuron is not very significant. However, the errors in the decisions taken by the neurons are reduced, especially when relatively few synapses are connected (for example in sparsely-coded memories). Furthermore, another improvement of this neuron, but which cannot be precisely predicted, is the fact that if the mirror is splitted into several parts, the probability that one mismatching between components will be compensated by another mismatching is enhanced. The neuron must of course be carefully designed, for example by physically inverting half of the mirrors, in order to compensate for oxide gradients. #### **ACKNOWLEDGEMENTS** All our acknowledgements go to Brigitte Wénin-Dupont, who developed and helped us to use "Bananas", a graphical software used to plot the simulations of this paper. #### **REFERENCES** - [1] E. Vittoz and X. Arreguit, "CMOS integration of Hérault-Jutten cells for separation of sources", Analog implementation of neural systems, C. Mead and M. Ismail eds., Kluwer Academic Publishers, Norwell, MA, 1989. - [2] M. Sivilotti, M. Mahowald and C. Mead, "Real-time visual computation using analog CMOS processing arrays", Proceedings of the 1987 Stanford conference on advanced research in VLSI, P. Losleben ed., MIT Press 1987. - [3] M. Verleysen, B. Sirletti and P. Jespers, "A new VLSI architecture for large Hopfield's neural networks", Proceedings of ESSCIRC88 (Manchester, U. K.). - [4] M. Verleysen, B. Sirletti, A. Vandemeulebroecke and P. Jespers, "Neural networks for highstorage content-addressable memory: VLSI circuit and learning algorithm", IEEE Journal of Solid-State Circuits, vol. 24, no. 3, 1989. \* \*\* -