tf.keras.layers.LSTM in TensorFlow

The tf.keras.layers.LSTM layer is a built-in TensorFlow layer designed to handle sequential data efficiently. It is widely used for applications like:

Text Generation
Machine Translation
Stock Price Prediction
Speech Recognition
Time-Series Forecasting

Long-Short Term Memory (LSTMs) address the limitations of standard Recurrent Neural Networks (RNNs) by incorporating gates (forget, input, and output gates), which help in retaining important information over long sequences.

Syntax of tf.keras.layers.LSTM

tf.keras.layers.LSTM(
units,
activation='tanh',
recurrent_activation='sigmoid',
return_sequences=False,
return_state=False,
dropout=0.0,
recurrent_dropout=0.0,
stateful=False,
unroll=False
)

Parameters of tf.keras.layers.LSTM:

units – Number of LSTM cells (neurons) in the layer.
activation – Activation function (default: 'tanh').
recurrent_activation – Activation for the recurrent step (default: 'sigmoid').
return_sequences – If True, returns sequences instead of just the last output.
return_state – If True, returns the hidden state and cell state along with the output.
go_backwards – If True, processes input in reverse order.
stateful – If True, maintains state across batches.
dropout – Dropout rate for input connections.
recurrent_dropout – Dropout rate for recurrent connections.
kernel_initializer – Weight initialization strategy.

How to Use tf.keras.layers.LSTM in TensorFlow?

Let's learn to use LSTMs in TensorFlow, covering key parameters like return_sequences and return_state. You'll also understand how LSTMs process sequences and retain long-term dependencies through hidden and cell states.

1. Import Required Libraries

Python

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np

2. Create Dummy Sequential Data

Python

# Generating random data
X = np.random.random((100, 10, 5)) 
y = np.random.randint(2, size=(100, 1))

3. Build an LSTM Model

Python

model = Sequential([
    LSTM(50, activation='tanh', return_sequences=True, input_shape=(10, 5)),  # First LSTM layer
    LSTM(30, activation='tanh'),  # Second LSTM layer
    Dense(1, activation='sigmoid')  # Output layer for binary classification
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Output:

4. Train the Model

Python

model.fit(X, y, epochs=10, batch_size=16)

Output:

Epoch 1/10
7/7 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.5260 - loss: 0.6946
.
.
.
Epoch 10/10
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.5830 - loss: 0.6830
<keras.src.callbacks.history.History at 0x7968ee53b250>

Understanding return_sequences and return_state

return_sequences=True → Returns the output for each time step instead of just the final one.
return_state=True → Returns the hidden state and cell state along with the output.

Example:

Python

lstm_layer = LSTM(50, return_sequences=True, return_state=True)
output, hidden_state, cell_state = lstm_layer(tf.random.normal([5, 10, 8]))  # (batch_size=5, time_steps=10, features=8)
print(output.shape, hidden_state.shape, cell_state.shape)

Output:

(5, 10, 50) (5, 50) (5, 50)

This means:

The output contains 50 units for each time step (10) and batch (5).
The hidden and cell states have 50 units per batch.

TensorFlow’s tf.keras.layers.LSTM is a powerful tool for handling sequential data, providing flexibility with return states, bidirectional processing, and dropout regularization. Whether you're working on NLP, finance, or speech recognition, LSTMs are essential for capturing long-term dependencies.

tf.keras.layers.LSTM in TensorFlow

Syntax of tf.keras.layers.LSTM

How to Use tf.keras.layers.LSTM in TensorFlow?

1. Import Required Libraries

2. Create Dummy Sequential Data

3. Build an LSTM Model

4. Train the Model

Explore