# Introduction n 1982, based on his studies of collective dynamical computation in neural networks, Hopfield [1,2,3] proposed an influential recurrent neural network with many potential applications such as content addressable memory and optimization engine for the traveling-salesman problem. He formulated an Energy function for the network using the Lyapunov Direct Method showing that the network converges to a stable state if it has symmetric weights. Each network node does not have self-feedback. Hopfield network comes in two forms: analog or discrete. However, in either format, the network can only be programmed to memorize patterns using the Hebbian Rule and has a limited memory capacity to store 0.15N patterns where N is the network's number of nodes. Many have tried to improve the network's memory capacity problem and trainability issue [4,5]. For example, instead of trying to memorize the patterns in one presentation cycle, Gardner [6,7,8] improved the network by presenting the training patterns repeatedly and using the perceptron convergence procedure to train each node to generate the correct state given the states of all the other nodes for a particular training vector. A Boltzmann machine is a stochastic recurrent neural network with interconnected visible and hidden nodes introduced by Hinton [9,10]. Like a Hopfield network, a Boltzmann has a similar energy function when the weights are symmetric and converges to a stable state when an input vector is presented to the visible nodes. A Boltzmann machine takes a long time to train. As a result, a restricted Boltzmann machine (RBM) was introduced [11,12,13]. It consists of two layers of nodes, L visible and M hidden nodes, connected by symmetric weights with no intralayer connection. Each node makes probabilistic decisions to be either on or off. The connection restriction allows for more efficient training algorithms, notably the gradient-based contrastive divergence algorithm, to be developed. The network is capable of learning the probabilistic pattern of a set of inputs. An analog restricted Hopfield network (RHN) is proposed in this paper to solve the memory capacity and the trainability issue of the Hopfield network. Like an RBM, the architecture consists of two layers of nodes, visible and hidden nodes, connected by directional weighted connection paths. The network is a fullyconnected bipartite graph and has no intralayer connection. The visible nodes are classified into either input or output nodes to manage the flow of information to/from the visible nodes. An energy or Lyapunov function was derived to prove that the proposed network always converges to stable states when an input vector is presented. The proposed network iterates, sending signals back and forth between the two layers until all its nodes reach an equilibrium state based on the corresponding basin of attraction, generating the desired output vector. Two training algorithms were used to train the proposed network: Simultaneous Perturbation Stochastic Approximation (SPSA) [14,15] and Back propagation Through Time (BPTT) [16,17]. The SPSA algorithm was introduced by Spall and is simple to implement. It can estimate the gradient of the error function using only two final error values of the function. Therefore, the merit of this training rule was demonstrated to train the proposed network. The BPTT algorithm, on the other hand, is based on the fact that the temporal operation of an RHN may be unfolded into ? a multilayer perceptron so that a standard back propagation algorithm could be applied. Simulation results show that the proposed network can be trained to implement a dynamic classifier implementing an EXOR function. Using A, U, T, S as training characters, the network was trained to be an associative memory. Simulation results show that the network performs perfect re-creation of these images even when the input image is noisy. The results also show that the proposed network performs better than the standard Hopfield Network and RBM. The paper is organized as follows. Section 2 presents the background work on Hopfield Network and Restricted Boltzmann Machine. An RHN with hidden nodes function is presented in section 3. In section 4, two algorithms to train the network are introduced. Section 5 presents some simulation results on the performance of the RHN. # II. # Background a) Hopfield Network An analog Hopfield network consists of fully interconnected nodes modeled as amplifiers, in conjunction with feedback circuits comprises of wires, resistors, and capacitors, as shown in Figure 1. # Figure 1: An analog Hopfield network The dynamics of the network can be described by the following differential equations: Where N is the number of nodes in the network, u i is the input voltage of the amplifier, T ij is the weight or conductance connecting the output of node j to the input of node i, V j is the output of node j, RC is the time constant of the network, I i is the input to node i, and g() is the output function of a node. The following energy function for the network was derived by Hopfield using the Lyapunov Direct Method if the network has symmetric weights and each network node does not have self-feedback. For the initial-value problem, I i input is applied to node i at t = 0 and then allow the network to evolve. The integration of the above differential equations provides the network states' evolution. With the energy function's existence, the network will always converge to a stable state. Hopfield networks can only be programmed to memorize patterns using the Hebbian Rule. When the output function g() is a sigmoid function, the network transforms the initial input vector it-eratively and continuously into the output vector in the range [0, 1]. To program the network to memorize specific binary input vectors (S(p), p = 1 . . . P ), the weight or conductance T ij is deter-mined by the following formula: When the output function g() is a hyperbolic tangent function, the network transforms the initial input vector du i dt = N j=1 T ij v j ? u i ? + I i ? = RC V i = g(u i ) (1) E = ? 1 2 N i=1 N j=1 T ij V i V j ? N i=1 V i u i ? ? N i=1 V i I i(2)T ij = P p=1 (2S i (p) ? 1)(2S j (p) ? 1)(3) iteratively and continuously into the output vector in the range [ 1,1]. To program the network to memorize specific binary input vectors (S(p), p = 1 . . . P ), the weight or conductance T ij is determined by the following formula: # b) Restricted Boltzmann Machine (RBM) A restricted Boltzmann machine (RBM) is a stochastic recurrent neural network [4,9,10] consisting of two layers of nodes, L visible and M hidden nodes, connected by symmetric weights with no intralayer connection. Each node makes probabilistic decisions to be either on or off. The network is capable of learning the probabilistic pattern of a set of inputs. The following differential equations can describe the dynamics of the network: In the forward path: Where ?????? ?? ?? is the sum of all inputs to the hidden node i, ?? ???? ?? is the weight connecting the output of visible node j to the input of hidden node i, ?? ?? ?? is the output of visible node j, ?? ?? ?? is the threshold of hidden node i, ?? ?? ?? is the output of hidden node i, g() is the Sigmoid logistic output function of hidden node i. In the backward path: Where ?????? ?? ?? is the sum of all inputs to the visible node i, ?? ?? ?? ?? the weight connecting the output of hidden node i to the input of visible node j, ?? ?? ?? is the output of hidden node i, ?? ?? ?? is the threshold of visible node j, ?? ?? ?? is the output of visible node j, g() is the Sigmoid logistic output function of hidden node j. For symmetric weights configuration, the energy or Lyapunov function of an RBM is given by: With the existence of energy function, the network will always converge to a stable state when an input vector is presented to the visible nodes. During the training of the network, an input vector p is presented. Let ?? ???? + (p) =?? ?? ?? ?? ?? ?? denotes the correlation of hidden node i and visible node j in the forward direction, and ?? ???? ? (p) = ?? ?? ?? ?? ?? ?? denotes the correlation in the backward direction, where ?? ?? ?? is the value of visible unit j of pattern p that is estimated by the network. The weight is updated during a training process by: Where ? is a small positive number. # III. # Proposed Restricted Hopfield Network(RHN) An analog restricted Hopfield network (RHN) is proposed to solve the memory capacity and train-ability issues of the Hopfield network. Like an RBM, the architecture consists of two layers of nodes, L visible and M hidden nodes, connected by directional weighted connection paths, as shown in Figure 2. The network is a connected bipartite graph and has no intralayer connection. T ij = P p=1 S i (p)S j (p)(4)sum H i = L j=1 w H ij V V j + ? H i(5)V H i = g(sum H i ) sum V j = M i=1 w V ji V H i + ? V j V V j = g(sum V j )(6)E = ? M i=1 L j=1 w H ij V H i V V j ? L j=1 ? V j V V j ? M i=1 ? H i V H i(7)w ij (k + 1) = w ij (k) + ?(e + ij (p) ? e ? ij (p))(8) Volume Xx XI Issue II V ersion I Global Journal of Researches in Engineering ( ) F Figure 2: A Restricted Hopfield Network The following differential equations describe the dynamics of the network: In the forward path: # Initial conditions: Where ?? ?? ?? is the sum of all inputs to the hidden nodes, ?? ???? ?? is the weight connecting the output of visible node j to the input of hidden node i, ?? ?? ?? is the output of visible node j, ?? ?? ?? is the threshold of hidden node i, ?? ?? ?? is the output of hidden node i, g() is the output function of hidden node I, and I j is the initial input presented to the visible node j. In the backward path: Where ?? ?? ?? is the sum of all inputs to the visible nodes, ?? ???? ?? is the weight connecting the output of hidden node i to the input of visible node j, ?? ?? ?? is the output of hidden node i, ?? ?? ?? is the threshold of output node j, ?? ?? ?? is the output of visible node j, and g() is the output function of visible node j. The network input can be either digital or analog, taking a value between 0 and 1. Using a Sigmoid function as output function g() for all the nodes, the output of all the nodes takes the value between 0 and 1. The proposed network will also work when the hyperbolic tangent function is used as the output function. In such a case, the output of all the nodes takes the value between 1 and 1. The proposed RHN always generates analog outputs between 0 and 1 using the Sigmoid output function or between ?1 and 1 using the hyperbolic tangent output function. Based on the Lyapunov Direct Method, it can be shown that Eqn. 11 is the energy or a Lyapunov function of the proposed network. Differentiating E, we get: Expanding all the above terms in Eqn. 12, we get: du H i dt = L j=1 w H ij V V j + ? H i V H i = g(u i )(9)V V j (0) = I j V H i (0) = 0 du V j dt = M i=1 w V ji V H i + ? V j (10)V V j = g(u V j ) E = ? 1 2 M i=1 L j=1 w H ij V H i V H j ? 1 2 L j=1 M i=1 w V ji V H j V V i ? M i=1 V H i ? H i ? L i=1 V V i ? V i (11)dE dt = ? d dt ( 1 2 M i=1 L j=1 w H ij V H i V H j ) ? d dt ( 1 2 L j=1 M i=1 w V ji V H j V V i ) ? d dt ( M i=1 V H i ? H i ) ? d dt ( L i=1 V V i ? V i )(12)d dt ( 1 2 M i=1 L j=1 w H ij V H i V H j ) = 1 2 M i=1 L j=1 w H ij dV H i dt V V j + 1 2 M i=1 L j=1 w H ij V H i dV V j dt Now consider the forward path: Because the outputs of visible nodes are constant in the forward path, then: Assume that all the weights are symmetric, then: Therefore, Equation 12 can be reduced to the following: and can be simplified to: Further simplified to: Using Chain Rule: Since: If the output function is a Sigmoid or hyperbolic tangent function, then: Then: Now consider the backward path: Because the outputs of hidden nodes are constant in the backward path, then: d dt ( 1 2 j=1 L M i=1 w V ji V H j V V i ) = 1 2 L j=1 M i=1 w V ji dV V j dt V H i + 1 2 L j=1 M i=1 w V ji V V j dV H i dt d dt ( M i=1 V H i ? H i ) = M i=1 dV H i dt ? H i d dt ( L i=1 V V i ? V i ) = L i=1 dV V i dt ? V i (13)dV V i dt = 0(14)w H ij = w V ji (15)dE dt = ? M i=1 L j=1 w H ij dV H i dt V V j ? M i=1 dV H i dt ? H i (16)dE dt = ? M i=1 dV H i dt ( L j=1 w H ij V V j + ? H i )(17)dE dt = ? M i=1 dV H i dt du H i dt (18)dE dt = ? M i=1 dV H i du H i du H i dt du H i dt dE dt = ? M i=1 dV H i du H i ( du H i dt ) 2(19)V H i = g(u H i )(20) dV HF Assume that all the weights are symmetric, then: Therefore, Equation 12 can be reduced to the following: and can be simplified to: Further simplified to: Using Chain Rule: Since: If the output function is a Sigmoid or hyperbolic tangent function, then: Then: The proposed RHN is a dynamic system. Therefore, it has attractors toward which a system tends to evolve for a wide variety of the system's initial conditions. The existence of a basin of attraction for each attractor guarantees that any initial condition in the nearby region will iterate to the attractor. When an input vector in a specific basin of attraction is presented, the proposed network sends signals back and forth between the hidden and visible layers until all the nodes reach an equilibrium state minimizing the energy function above, generating the desired output vector. The visible nodes can be divided into either A input and B output nodes, as shown in Figure 3. In the forward computation path, each node in the hidden layer receives the weighted output of both the input and output nodes of the visible layer. In the backward computation direction, only the output nodes receive the weighted output of the hidden nodes. Therefore, when an input vector is presented to the input nodes, signals are sent back and forth between the hidden and output nodes until an equilibrium state is reached. w H ij = w V ji (22) dE dt = ? L j=1 M i=1 w V ji dV V j dt V H i ? L j=1 dV V j dt ? V j (23)dE dt = ? L j=1 dV V j dt ( M j=1 w V ji V H i + ? V j )(24)dE dt = ? L i=1 dV V j dt du V j dt (25)dE dt = ? L j=1 dV V j du V j du V j dt du V j dt dE dt = ? L j=1 dV V j du V j ( du V j dt ) 2 (26) V V j = g(u V j )(27)dV V j du V j is always positive dE dt is always negative in the backward path The following differential equations describe the dynamics of the network: In the forward path: # Initial conditions: Where ?? ?? ?? is the sum of all inputs to the hidden nodes, ?? ???? ?? is the weight connecting the output of output node j to the input of hidden node i, ?? ?? ?? is the output of output node j, ?? ???? ?? is the weight connecting the output of input node k to the input of hidden node i, I k is the output of input node k, ?? ?? ?? is the threshold of hidden node i, ?? ?? ?? is the output of hidden node i, and g() is the output function of hidden node i. # In the backward path: Where ?? ?? ?? is the sum of all inputs to the output nodes, ?? ???? ?? is the weight connecting the output of hidden node i to the input of output node j, ?? ?? ?? is the output of hidden node i, ?? ?? ?? is the threshold of output node j, ?? ?? ?? is the output of output node j, and g() is the output function of output node j. The proposed RHN always generates analog outputs between 0 and 1 when g() is a sigmoid function or between ?1 and 1 when g() is a hyperbolic tangent function. Using the Lyapunov Direct Method, it can be shown that Eqn. 31 is the energy or Lyapunov functions of the proposed network. Differentiating E, we get: Expanding all the above terms in Eqn.32, we get: Then: du H i dt = ? B j=1 w H ij V O j + H k=1 w H ik I k + ? H i V H i = g(u H i )(28)V O j (0) = 0 V H i (0) = 0 (29) du O j dt = M i=1 w O ji V H i + ? O j V O j = g(u O j )(30)E = ? 1 2 M i=1 B j=1 w H ij V H i V O j ? 1 2 B j=1 M i=1 w O ji V O j V H i ? M i=1 A k=1 w H ik V H i I k ? M i=1 V H i ? H i ? B i=1 V O i ? O i (31) dE dt = ? d dt ( 1 2 M i=1 B j=1 w H ij V H i V O j ) ? d dt ( 1 2 B j=1 M i=1 w O ji V O j V H i ) ? d dt ( M i=1 A k=1 w H ik V H i I k ) ? d dt ( M i=1 V H i ? H i ) ? d dt ( B i=1 V O i ? O i )(32 For the rest of the terms, we get: Now consider the forward path: Because the outputs of visible nodes are constant in the backward path, then: Assume that all the weights are symmetric, then: Therefore, Equation 32 can be reduced to: and can be simplified to: Further simplified to: Using Chain Rule: Since: If the output function is a sigmoid or hyperbolic tangent function, then: d dt ( 1 2 M i=1 B j=1 w H ij V H i V O j ) = 1 2 M i=1 B j=1 w H ij dV H i dt V O j + 1 2 M i=1 B j=1 w H ij V H i dV O j dt d dt ( 1 2 B j=1 M i=1 w O ji V O j V H i ) = 1 2 B j=1 M i=1 w O ji dV O j dt V H i + 1 2 B j=1 M i=1 w O ji V O j dV H i dt d dt ( M i=1 A k=1 w H ik V H i I k ) = M i=1 A k=1 w H ik dV H i dt I k + M i=1 A k=1 w H ik V H i dI k dt (33) dI k dt = 0 (34) d dt ( M i=1 A k=1 w H ik V H i I k ) = M i=1 A k=1 w H ik dV H i dt I k (35) d dt M i=1 V H i ? H i = M i=1 dV H i dt ? H i d dt B i=1 V O i ? O i = B i=1 dV O i dt ? O i (36) dV O j dt = 0(37)w H ij = w O ji (38)dE dt = ? M i=1 B j=1 w H ij dV H i dt V O j ? M i=1 A k=1 w H ik dV H i dt I k ? M i=1 dV H i dt ? H i (39) dE dt = ? M i=1 dV H i dt ( B j=1 w H ij V O j + A k=1 w H ik I k + ? H i )(40)dE dt = ? M i=1 dV H i dt du H i dt (41)dE dt = ? M i=1 dV H i du H i du H i dt du H i dt dE dt = ? M i=1 dV H i du H i ( du H i dt ) 2(42)V H i = g(u H i ) (43)dV H i du H i is always positive (44) Volume Xx XI Issue II V ersion I Global Journal of Researches in Engineering ( ) F a) SPSA Algorithm As mentioned earlier (Equations 3 and 4), the conventional Hebbian rule is not suitable for training the proposed network because the outputs of the hidden nodes are unknown, and the equations cannot handle analog quantities. The simultaneous perturbation stochastic approximation (SPSA) algorithm uses a gradient approximation that requires only 2N objective function measurements over all N iterations regardless of the optimization network's dimension [4,9]. Therefore, the SPSA algorithm, as shown in the following formulae, is suited for a high-dimensional optimization problem of minimizing an objective function dependent on multiple adjustable symmetric weights. At each iteration, a simultaneous perturbation delta vector with mutually independent zero-mean random variables is generated; each element ? ij (k) in ?(k) matrix is generated with a probability of 0.5 of being either +1 or 1. Two weight matrices W + and W are calculated by adding and subtracting the ?(k) matrix scaled by gain sequence c(k) to/from the current weight matrix W (k) to compute their respective contributions J(W +) and J(W ) to the objective function. Dependent on the outcome of the evaluation and scaled by gain sequences a(k) and c(k), the current weight matrix W is updated accordingly. The gain sequences a(k) and c(k) decrease as the number of iterations k increases, will converge to 0 as k approaches ?. The objective function J used for the optimization of the proposed RHN is: Where ?? ? ij is the j th element of the desired output vector i, ?? ???? ?? (k) is the output value of the j th output node when training pattern i is presented, B is the number of output nodes, and P is the number of training patterns. # b) Backward Propagation Through Time (BPTT) Algorithm For simplicity and the ability to apply BPTT training algorithm, the proposed network in Figure 3 can be transformed into a discrete-time system and unfolded in time, as shown in Figure 4. The error function J used for the optimization of the proposed RHN is: ?W ij = J(W (k) + c(k)?(k)) ? J(W (k) ? c(k)?(k)) 2c(k)? ij (k) W ij (k + 1) = W ji (k + 1) = W ij (k) ? a(k)?W ij (k)(55)J = P i=1 B j=1 ( Dij ? V O ij (k)) 2 (56) Where ?? ? pj is the j th element of the desired output vector k, V O (T ) is the output value of the j th output node when training pattern p is presented, B is the number of output nodes, and P is the number of training patterns. J = 1 2 P p=1 B j=1 ( Dpj ? V O pj (T )) 2(57) Applying the BPTT algorithm, Where the deltas are: Since the weights are symmetric, then: V. # Simulation Results # a) Comparison of Hopfield Network and RHN Let us consider storing 3 vectors ([1100], [0110], [0101]) in a Hopfield network. Hopfield network can be programmed to memorize these three vector patterns using Hebbian Rule, and the weight matrix is: ?W O ji = ?(? O pj V H pj (T ) + ? H2 pj V H i (T ? 1))(58) For j = 1 . . . A: ?W H ij = ?(? H1 pi + ? H3 pi )I O pj(59) For j = A + 1 . . . A + B, k = 1 . . . B: ?W H ij = ?(? H1 pi V O pk (T ? 1) + ? H3 pi V O pk (T ? 2))(60)? O pj = g (u O pj (T ))( Dpj ? V O pj (T )) ? H1 pi = g (u H pi (T )) B j=1 ? O pj W O ji ? H2 pj = g (u O pj (T ? 1)) M i=1 ? H1 pj W H ij ? H3 pi = g (u H pj (T ? 1)) B i=1 ? H2 pj W O ji (61) W O ji (T + 1) = w H ij (T + 1) = w O ji (T ) + ?W O ji + ?W H ij(62) Volume Xx XI Issue II V ersion I Global Journal of Researches in Engineering ( ) F The weight matrix will generate the correct output vector when the corresponding input vector with noise is provided. Let us consider storing the fourth vector [1111] in the network. The weight matrix to store vector [1111] is: Adding this new weight matrix to the previous weight matrix of the Hopfield network programmed to store 3 vectors ([1100], [0110], [0101]) yields a weight matrix with zero elements, erasing all the previous programming, as shown in the following. Therefore, the Hopfield network cannot be programmed to remember all four vectors ([1100], [0110], [0101], [1111]) using Hebbian rule. However, by introducing five hidden nodes in the proposed RHN, all the vectors can be easily stored and recalled correctly. The simulation shows that it is possible to increase the Hopfield network's memory capacity by using more hidden nodes. By increasing the number of hidden nodes to 50, the proposed network can remember all 16 binary vectors. # b) EXOR Problem The EXOR or exclusive-or problem is a classical problem in neural network research. It is a problem of using a neural network to predict EXOR logic gates' outputs given two binary inputs. The network should return a "1" if the two inputs are not equal and a "0" if they are similar. The EXOR problem appears to be simple. However, Minksy and Papert in 1969 showed that this was a big problem for neural network architectures of the 1960s, providing a good test for the proposed network [18]. Using an RHN with input and output nodes, as shown in Figure 3, is used. An RHN with 2 input nodes, one output node, and four hidden nodes was created. It was trained to learn the EXOR problem. Figure 6 shows the training curve of the network using the SPSA algorithm described earlier. The network is trained after 1000 training iterations. An RHN with 35 input nodes, 35 output nodes, and ten hidden nodes is created. The network is then trained to perform auto-encoding of A, U, T, S characters, each with 5 7 pixels. The input characters are distorted by changing some of the pixels to test the network's ability to re-create the characters in the presence of noise. ? ? ? 0 1 ?1 ?1 1 0 ?1 ?1 ?1 ?1 0 1 ?1 ?1 1 0 ? ? ? ? + ? ? ? ? 0 ?1 ?1 1 ?1 0 1 ?1 ?1 1 0 ?1 1 ?1 ?1 0 ? ? ? ? + ? ? ? ? 0 ?1 1 ?1 ?1 0 ?1 1 1 ?1 0 ?1 ?1 1 ?1 0 ? ? ? ? = ? ? ? ? 0 ?1 ?1 ?1 ?1 0 ?1 ?1 ?1 ?1 0 ?1 ?1 ?1 ?1 0 ? ? ? ? ? ? ? ? 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 ? ? ? ? ? ? ? ? 0 ?1 ?1 ?1 ?1 0 ?1 ?1 ?1 ?1 0 ?1 ?1 ?1 ?1 0 ? ? ? ? + ? ? ? ? 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 ? ? ? ? = ? ? ? ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ? ? ? ? In Figure 9, the distorted images of A, U, T, S are on the left and re-created images are on the right. The figure shows that the network can perform perfect re-creation of these images even when distorted images with the Hamming distance of 13 are presented. As shown in Figure 10, an RHN trained with the BPTT algorithm can re-create the images with an average error rate of 2.4% when the Hamming distance is 5. The result is significantly improved compared with the RHN trained with the SPSA algorithm resulting in an error pattern of more than 4.4% for the same input. The classical Hopfield network and RBM can only achieve an error rate of 22.6% and 13.4%, respectively, for input vectors with the same Hamming distance. The RHN performs better compared to the Hopfield network and RBM. # d) Creating an Associative Memory Model of Hand Written Digits The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning. The dataset contains 60,000 images, typically split into 50,000 training images and 10,000 validation images. In this example, instead of performing handwritten classification, some of the 50,000 MNIST images are used as training inputs to associate with 3 5 pixel models representing handwritten digits, as shown in Figure 11. All 10,000 validation MNIST images will be used as test images to verify the network's associative function ? The Input layer will hold the raw pixel values of the image, in this case, an image of width 28, height 28; ? The 2D-Convolution layer will compute the output of ? The Max-pooling layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in a smaller volume of [7 × 7 × 4]. ? The Matrix to Vector layer will convert a volume of ? An RHN with 196 input nodes, 100 hidden nodes, and 15 output nodes will be trained using the BPTT algorithm to associate an input image with a digit model of 3 × 5 pixels. The network architecture was inspired by the visual cortex's organization, having a similar connectivity pattern of the retina, ganglion cells, and neurons in a human brain. Individual neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. A collection of such fields overlap to cover the entire image area. created as outputs at iteration 0. The network generated correct models of "0" and "4" after the first iteration. When input image "1" is presented, it takes two feedback iterations to generate the correct model of "1". When input image "6" is presented, the network first generated a wrong model of "5" and then corrected its output to generate the correct model of "6" after the first feedback iteration. The effect of feedback iterations on the performance of an RHN is illustrated in Figure 14. The figure shows that the network can re-create half of the models correctly without any feedback iteration. However, the network's performance can be improved by applying one or more feedback iterations. This result illustrates the importance of feedback iteration in implementing an associative memory using the proposed RHN. Figure 15 shows a network's performance after being trained using only 500 and 5000 training images. The graph shows that the network could re-create 6674 of the images perfectly when 500 training images were used, while the network can re-create 7849 of the images perfectly when 5000 training images were used. # Conclusion A trainable analog Restricted Hopfield Network is presented in this paper. An energy or Lyapunov function was derived to show that the proposed network will converge to stable states when an input vector is introduced. The proposed network can be trained using either the modified SPSA or BPTT algorithms to ensure that all the weights are symmetric. Simulation results show that the presence of hidden nodes increases the network's memory capacity. Using A, U, T, S as training characters, the network can be trained to be an associative memory. Simulation results show that the network can perform perfect re-creation of noisy images and perform better than the standard Hopfield Network and RBM. Simulation results also illustrate the importance of feedback iteration in implementing associative memory to re-create from noisy images. 3![Figure 3: A Restricted Hopfield Network with Inputs and Outputs Nodes](image-2.png "Figure 3 :") ![Volume Xx XI Issue II V ersion I Global Journal of Researches in Engineering ( ) For constant input Ik:](image-3.png ")F") 4![Figure 4: Unfolded RHN in 2-time steps W represents all the weights in the proposed RHN. V O (T) and V O (T ? 1) are the outputs of the output nodes at T and T ? 1, respectively. The structure of the block is presented in Figure 5.](image-4.png "Figure 4 :") 5![Figure 5: Structure of block W in Figure 4](image-5.png "Figure 5 :") 6![Figure 6: The training plot of square error J Figures 7 and 8 show the basin of attraction and the network's energy profile, indicating four attractors, one attractor for each input pair.](image-6.png "Figure 6 :") ![](image-7.png "?") 7![Figure 7: Basin of attraction of an RHN implementing EXOR functionTable1illustrates the effect of feedback as the output of an RHN network evolves when an initial input is presented. The EXOR output takes 3 to 4 feedback iterations to settle on the correct result.](image-8.png "Figure 7 :") 8![Figure 8: The energy function of an RHN implementing EXOR function](image-9.png "Figure 8 :") 9![Figure 9: Distorted and re-created images of A, U, T, and S](image-10.png "Figure 9 :") 10![Figure 10: Performance comparison of the Hopfield Network and RBM with RHN](image-11.png "Figure 10 :") 11![Figure 11: 3 × 5 pixel model of digits The architecture to process MNIST images is inspired by a Convolution Neural Network architecture [19, 20], as shown in Figure 12. It consists of several layers: Input, 2D-Convolution, ReLU (Rectified Linear Unit), Max-pooling, Matrix to Vector, and RHN. The function of each layer in more detail is presented as follows:](image-12.png "Figure 11 :") ![nodes that are connected to local re-gions in the input image, each computing a dot product between their weights and a small area they are connected to in the input. Since four filters (horizontal line, vertical line, 45-degree line, and 135-degree line detection) are used in the architecture, this results in a volume equal to [26 × 26 × 4]; ? The ReLU layer will apply an elementwise activation function, max(0,x), thresholding at zero. The layer leaves the size of the volume unchanged ([26 × 26 × 4]).](image-13.png "") 12![Figure 12: Image Processing and RHN An RHN was trained with 1 percent of the 50,000 training images. Figure 13 shows the responses of an RHN when different input images are presented at zero, one, and two feedback iterations.An RHN with zero feedback iteration is a feedforward multilayer perceptron (MLP) acting as a mapping function. When input images "0" and "4" are presented, models of "0" and "4" with 1-bit error were](image-14.png "Figure 12 :") ![data (in this case [7 × 7 × 4] elements) into a linear vector of 196 elements.](image-15.png "F") 13![Figure 13: RHN's sample output at different iterations](image-16.png "Figure 13 :") 1InputT = 0T = 1Output at each iteration T = 2 T = 3T = 5T = 60000.0040.0040.0040.0040.0040100.2580.5100.6380.8470.9971000.3160.5210.8510.9900.9951100.0030.0030.0030.0030.003c) Associative Memory Problem © 2021 Global Journals ( ) F © 2021 Global Journals The proposed RHN is a dynamic system. When an input vector is presented to the input nodes, and the input vector is in a specific basin of attraction, the proposed network sends signals back and forth between the hidden and output nodes until the network reaches an equilibrium state or the corresponding attractor. ## IV. ## Training of Proposed RHN The proposed network can be trained using either the modified SPSA or BPTT algorithms to ensure that all the weights are symmetric, as described in the following. (51) * Neural networks and physical systems with emergent collective computational abilities JJohn Hopfield Proceedings of the national academy of sciences the national academy of sciences 1982 79 * neural" computation of decisions in optimization problems JJohn DavidWHopfield Tank Biological cybernetics 52 3 1985 * Computing with neural circuits: A model JJohn DavidWHopfield Tank Science 233 4764 1986 * The capacity of the hopfield associative memory EdwardcRobertj Mceliece Posner SantoshsEugener Rodemich Venkatesh IEEE transactions on Information Theory 33 4 1987 * The basins of attraction of a new hopfield learning rule JAmos RomainStorkey Valabregue Neural Networks 12 6 1999 * Maximum storage capacity in neural networks ElizabethGardner Europhysics Letters) 4 4 481 1987 EPL * The space of interactions in neural network models ElizabethGardner Journal of physics A: Mathematical and general 21 1 257 1988 * Optimal storage properties of neural network models ElizabethGardner BernardDerrida Journal of Physics A: Mathematical and general 21 1 271 1988 * A learning algorithm for boltzmann machines HDavid GeoffreyEAckley TerrenceJHinton Sejnowski Cognitive science 9 1 1985 * Deep boltzmann machines RuslanSalakhutdinov GeoffreyHinton Artificial intelligence and statistics PMLR 2009 * A practical guide to training restricted boltzmann machines EGeoffrey Hinton Neural networks: Tricks of the trade Springer 2012 * The recurrent temporal restricted boltzmann machine IlyaSutskever GeoffreyEHinton Graham WTaylor Advances in neural information processing systems 2009 * Restricted boltzmann machines for collaborative filtering RuslanSalakhutdinov AndriyMnih GeoffreyHinton Proceedings of the 24th international conference on Machine learning the 24th international conference on Machine learning 2007 * Multivariate stochastic approximation using a simultaneous perturbation gradient approximation CJames Spall IEEE transactions on automatic control 37 3 1992 * An overview of the simultaneous perturbation method for efficient optimization CJames Spall Johns Hopkins apl technical digest 19 4 1998 * Backpropagation through time: what it does and how to do it JPaul Werbos Proceedings of the IEEE 78 10 1990 * Neural networks and learning machines, 3/E. Pearson Education India SimonHaykin 2010 * Perceptrons: An introduction to computational geometry MarvinMinsky ASeymour Papert 2017 MIT press * Back propagation applied to handwritten zip code recognition YannLecun BernhardBoser SJohn DonnieDenker RichardEHenderson WayneHoward LawrenceDHubbard Jackel Neural computation 1 4 1989 * Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks YannLecun YoshuaBengio 1995. 1995 3361