Let’s explain Recurrent Neural Networks (RNNs) with an example. Suppose we are given the following sequence of images and asked to guess the last two images. How would you figure out what the last two images are? If you examine the shapes, you’ll notice that after every rectangle, there is a square. So, it can be easily predicted that the last two images are a square and a rectangle.
How ML algorithms know that?
But when we ask a machine learning algorithm to make this prediction, how can it make a good guess? The algorithm is expected to learn from the previous steps. For instance, it should learn that after any rectangle, there is a square, and after any square, there is a rectangle. Traditional neural networks (TNNs) fail to predict the last two occurrences because they cannot understand the sequence of instances. When using TNNs, they only care about extracting features, so they identify every sequence of images the same when they contain rectangles and squares, without considering the order. This means they don’t care about the sequence of images. If we have three rectangles in a row and three squares in a row, classical neural networks classify them the same way as a sequence where rectangles and squares alternate.
We think about new issues based on what we already know and have learned. For example, when we start reading an article, we understand each word based on concepts we’ve previously learned. Our thoughts are continuous; we use our past knowledge to tackle new challenges we encounter. However, a traditional neural network (TNN) can’t operate this way, which is a significant drawback.
Let’s consider another example. Suppose we start watching a movie. As the movie progresses, we understand more about what’s happening because we remember the events from the beginning to the current moment. This continuous flow of thoughts is easy for us, but how can a neural network make sense of current events without information about previous ones? This is where Recurrent Neural Networks (RNNs) come into play. One of the most famous RNNs is Long Short-Term Memory (LSTM).
How Do RNNs Work?
The effectiveness of RNNs stems from their use of a concept called sequential memory. To better understand this, think of the alphabet and try to recite it from the beginning. It’s easy because we’ve practiced this sequence many times. Now try reciting the alphabet backward; it becomes challenging, right? This is because we’ve practiced this sequence less. This illustrates sequential memory. Sequential memory is a mechanism that helps us recognize patterns in sequences.
So far, we understand that RNNs use the concept of sequential memory, but how? Let’s examine the structure of an RNN to answer this question.
Structure of Recurrent Neural Networks (RNN)
Let’s start with a traditional neural network. A traditional neural network consists of an input layer, hidden layers, and an output layer. But how can we make this network use previous information to influence the new information it receives? How about adding a loop to this network to perform this task?
The above diagram shows the structure of an RNN. This network includes a looping mechanism that acts like a highway, allowing information to be passed from one step to the next.
Overall, an RNN is similar to a traditional neural network but consists of a chain of repeating units that pass information to each other. To better understand this, let’s look at above diagram. In this diagram, we’ve unfolded the loop to better understand the structure of the network. As shown, the network first receives the input u_{k-1} from the sequence of inputs and produces the output h_{k} . This output, along with u_{k} , is input to the network in the next step. This process continues until we reach the final output. This mechanism helps the network remember the context during training.
Due to the chain-like structure of RNNs, they can work with sequences and lists. Whenever a model needs context to produce the desired output based on the input, an RNN is used. For example, RNNs can be used for next-word prediction, image captioning, speech recognition, time series anomaly detection, and stock market prediction.
Application and Functioning of RNNs
One of my favorite applications of RNNs is sign language detection. Videos of various sign languages for different words can be inputted to RNNs models, such as Long Short-Term Memory (LSTM) networks, to classify and detect the correct word when individuals are showing a sign. In this problem, the sequence of the frames matters significantly. This is a practical example of RNNs. Additionally, there are many other applications such as video classification and time series data analysis where RNNs prove to be highly effective.
Types of Recurrent Neural Networks
Various types of RNNs have been developed over the years, including:
- Vanilla RNNs: The simplest type of RNN, which takes the current input and the previous hidden state as input and produces the current output and next hidden state as output.
- Long Short-Term Memory (LSTM) Networks: A type of RNN with an internal memory cell and three gates (input gate, forget gate, and output gate) to control the flow of information. LSTMs are designed to avoid the vanishing gradient problem that can occur when training deep neural networks.
- Gated Recurrent Unit (GRU) Networks: Similar to LSTMs but with a simplified architecture that combines the forget and input gates into a single update gate. GRUs are computationally more efficient and easier to train than LSTMs.
- Bidirectional RNNs: A type of RNN that processes the input sequence in both forward and backward directions, allowing the network to consider both past and future information when making predictions.
- Encoder-Decoder RNNs: An architecture commonly used in sequence-to-sequence models, such as machine translation and speech recognition. The encoder processes the input sequence and produces a fixed-length vector representation, which is then input to the decoder to produce the output sequence.
- Attention-Based RNNs: An architecture that uses an attention mechanism to focus on the most relevant parts of the input sequence when making predictions. This allows the network to selectively attend to different parts of the input sequence based on the current context.