TensorFlow在非线性分类与图像识别中的应用
在机器学习和深度学习领域,TensorFlow是一个广泛使用的开源库,它提供了强大的工具来构建和训练各种神经网络模型。本文将介绍如何使用TensorFlow进行非线性分类以及对CIFAR10数据集进行图像识别。
非线性分类:模拟异或门操作
我们首先尝试构建一个人工神经网络(ANN)来模拟具有两个输入的异或(XOR)门操作。异或门的真值表如下:
| A | B | 输出 |
| — | — | — |
| 1 | 1 | 0 |
| 0 | 1 | 1 |
| 0 | 0 | 0 |
| 1 | 0 | 1 |
通过绘制图形可以明显看出,这些类别是非线性可分的,因此需要使用隐藏层。根据相关经验,一个包含两个神经元的单一隐藏层就足够了。
网络架构如下:隐藏层接收输入层的输入,根据其权重和偏置,两个激活函数将产生两个输出。隐藏层的输出将作为输出层的输入,输出层使用其激活函数产生输入样本的最终预期类别。
以下是模拟具有两个输入的异或门的ANN完整代码:
import tensorflow
# Preparing a placeholder for the training data inputs of shape (N, 3)
training_inputs = tensorflow.placeholder(shape=[4, 2], dtype=tensorflow.float32, name="Inputs")
# Preparing a placeholder for the training data outputs of shape (N, 1)
training_outputs = tensorflow.placeholder(shape=[4, 1], dtype=tensorflow.float32, name="Outputs")
# Initializing the weights of the hidden layer of shape (2, 2)
hidden_weights = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(2,2), name="HiddenRandomWeights"), dtype=tensorflow.float32, name="HiddenWeights")
# Initializing the bias of the hidden layer of shape (1,2)
hidden_bias = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(1,2), name="HiddenRandomBias"), dtype=tensorflow.float32, name="HiddenBias")
# Calculating the SOPs by multiplying the weights matrix of the hidden layer by the data inputs matrix
hidden_sop = tensorflow.matmul(a=training_inputs, b=hidden_weights, name="HiddenSOPs")
# Adding the bias to the SOPs of the hidden layer
hidden_sop_bias = tensorflow.add(x=hidden_sop, y=hidden_bias, name="HiddenAddBias")
# Sigmoid activation function of the hidden layer outputs
hidden_sigmoid = tensorflow.nn.sigmoid(x=hidden_sop_bias, name="HiddenSigmoid")
# Initializing the weights of the output layer of shape (2, 1)
output_weights = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(2,1), name="OutputRandomWeights"), dtype=tensorflow.float32, name="OutputWeights")
# Initializing the bias of the output layer of shape (1,1)
output_bias = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(1,1), name="OutputRandomBias"), dtype=tensorflow.float32, name="OutputBias")
# Calculating the SOPs by multiplying the weights matrix of the hidden layer by the outputs of the hidden layer
output_sop = tensorflow.matmul(a=hidden_sigmoid, b=output_weights, name="Output_SOPs")
# Adding the bias to the SOPs of the hidden layer
output_sop_bias = tensorflow.add(x=output_sop, y=output_bias, name="OutputAddBias")
# Sigmoid activation function of the output layer outputs. These are the predictions.
predictions = tensorflow.nn.sigmoid(x=output_sop_bias, name="OutputSigmoid")
# Calculating the difference (error) between the ANN predictions and the correct outputs
error = tensorflow.subtract(x=training_outputs, y=predictions, name="Error")
# Square error.
square_error = tensorflow.square(x=error, name="SquareError")
# Measuring the prediction error of the network after being trained
loss = tensorflow.reduce_sum(square_error, name="Loss")
# Minimizing the prediction error using gradient descent optimizer
train_optim = tensorflow.train.GradientDescentOptimizer(learning_rate=0.01, name="GradientDescent")
minimizer = train_optim.minimize(loss, name="Minimizer")
# Training data inputs of shape (4, 2)
training_inputs_data = [[1, 0],
[0, 1],
[0, 0],
[1, 1]]
# Training data desired outputs
training_outputs_data = [[1.0],
[1.0],
[0.0],
[0.0]]
# Creating a TensorFlow Session
with tensorflow.Session() as sess:
writer = tensorflow.summary.FileWriter(logdir="\\AhmedGad\\TensorBoard\\", graph=sess.graph)
# Initializing the TensorFlow Variables (weights and bias)
init = tensorflow.global_variables_initializer()
sess.run(init)
# Training loop of the neural network
for step in range(100000):
print(sess.run(fetches=minimizer, feed_dict={training_inputs: training_inputs_data, training_outputs: training_outputs_data}))
# Class scores of training data
print("Expected Outputs for Train Data:\n", sess.run(fetches=[predictions, hidden_weights, output_weights, hidden_bias, output_bias], feed_dict={training_inputs: training_inputs_data}))
writer.close()
训练过程完成后,样本被正确分类,预测输出如下:
[[0.96982265],
[0.96998841],
[0.0275135],
[0.0380362]]
训练后网络的参数如下:
- 隐藏层权重: [–6.27943468, –4.30125761], [–6.38489389, –4.31706429]]
- 隐藏层偏置: [[–8.8601017], [8.70441246]]
- 输出层权重: [[2.49879336, 6.37831974]]
- 输出层偏置: [[–4.06760359]]
CIFAR10图像识别使用卷积神经网络(CNN)
接下来,我们将使用TensorFlow构建一个CNN来识别CIFAR10数据集中的图像。
准备训练数据
CIFAR10数据集的二进制数据可以从 这里 下载。该数据集有60,000张图像,分为训练集和测试集。训练数据包含五个二进制文件,每个文件有10,000张图像,图像为32×32×3的RGB格式。
为了解码二进制文件,我们创建了以下函数:
import pickle
import os
import numpy
def unpickle_patch(file):
patch_bin_file = open(file, 'rb') # Reading the binary file.
patch_dict = pickle.load(patch_bin_file, encoding='bytes') # Loading the details of the binary file into a dictionary.
return patch_dict # Returning the dictionary.
def get_dataset_images(dataset_path, im_dim=32, num_channels=3):
num_files = 5 # Number of training binary files in the CIFAR10 dataset.
images_per_file = 10000 # Number of samples within each binary file.
files_names = os.listdir(dataset_path) # Listing the binary files in the dataset path.
dataset_array = numpy.zeros(shape=(num_files * images_per_file, im_dim, im_dim, num_channels))
dataset_labels = numpy.zeros(shape=(num_files * images_per_file), dtype=numpy.uint8)
index = 0 # Index variable to count number of training binary files being processed.
for file_name in files_names:
if file_name[0:len(file_name) - 1] == "data_batch_":
print("Working on : ", file_name)
data_dict = unpickle_patch(dataset_path + file_name)
images_data = data_dict[b"data"]
# Reshaping all samples in the current binary file to be of 32x32x3 shape.
images_data_reshaped = numpy.reshape(images_data, newshape=(len(images_data), im_dim, im_dim, num_channels))
# Appending the data of the current file after being reshaped.
dataset_array[index * images_per_file:(index + 1) * images_per_file, :, :, :] = images_data_reshaped
# Appending the labels of the current file.
dataset_labels[index * images_per_file:(index + 1) * images_per_file] = data_dict[b"labels"]
index = index + 1 # Incrementing the counter of the processed training files by 1 to accept new file.
return dataset_array, dataset_labels # Returning the training input data and output labels.
构建CNN
CNN的数据流图在 create_CNN 函数中创建,它创建了一个由卷积(conv)、ReLU、最大池化、丢弃和全连接(FC)层组成的堆栈。CNN的架构如下:
graph LR
A[输入数据] --> B[卷积层1]
B --> C[ReLU层1]
C --> D[最大池化层1]
D --> E[卷积层2]
E --> F[ReLU层2]
F --> G[最大池化层2]
G --> H[卷积层3]
H --> I[ReLU层3]
I --> J[最大池化层3]
J --> K[丢弃层]
K --> L[全连接层1]
L --> M[全连接层2]
以下是构建CNN的代码:
import tensorflow
def create_conv_layer(input_data, filter_size, num_filters):
filters = tensorflow.Variable(tensorflow.truncated_normal(shape=(filter_size, filter_size, tensorflow.cast(input_data.shape[-1], dtype=tensorflow.int32), num_filters), stddev=0.05))
conv_layer = tensorflow.nn.conv2d(input=input_data,
filter=filters,
strides=[1, 1, 1, 1],
padding="VALID")
return filters, conv_layer # Returning the filters and the convolution layer result.
def dropout_flatten_layer(previous_layer, keep_prop):
dropout = tensorflow.nn.dropout(x=previous_layer, keep_prob=keep_prop)
num_features = dropout.get_shape()[1:].num_elements()
layer = tensorflow.reshape(previous_layer, shape=(-1, num_features)) # Flattening the results.
return layer
def fc_layer(flattened_layer, num_inputs, num_outputs):
fc_weights = tensorflow.Variable(tensorflow.truncated_normal(shape=(num_inputs, num_outputs), stddev=0.05))
fc_result1 = tensorflow.matmul(flattened_layer, fc_weights)
return fc_result1 # Output of the FC layer (result of matrix multiplication).
def create_CNN(input_data, num_classes, keep_prop):
filters1, conv_layer1 = create_conv_layer(input_data=input_data, filter_size=7, num_filters=4)
relu_layer1 = tensorflow.nn.relu(conv_layer1)
max_pooling_layer1 = tensorflow.nn.max_pool(value=relu_layer1,
ksize=[1, 2, 2, 1],
strides=[1, 1, 1, 1],
padding="VALID")
filters2, conv_layer2 = create_conv_layer(input_data=max_pooling_layer1, filter_size=5, num_filters=3)
relu_layer2 = tensorflow.nn.relu(conv_layer2)
max_pooling_layer2 = tensorflow.nn.max_pool(value=relu_layer2,
ksize=[1, 2, 2, 1],
strides=[1, 1, 1, 1],
padding="VALID")
filters3, conv_layer3 = create_conv_layer(input_data=max_pooling_layer2, filter_size=3, num_filters=2)
relu_layer3 = tensorflow.nn.relu(conv_layer3)
max_pooling_layer3 = tensorflow.nn.max_pool(value=relu_layer3,
ksize=[1, 2, 2, 1],
strides=[1, 1, 1, 1],
padding="VALID")
flattened_layer = dropout_flatten_layer(previous_layer=max_pooling_layer3, keep_prop=keep_prop)
fc_result1 = fc_layer(flattened_layer=flattened_layer, num_inputs=flattened_layer.get_shape()[1:].num_elements(),
num_outputs=200)
fc_result2 = fc_layer(flattened_layer=fc_result1, num_inputs=fc_result1.get_shape()[1:].num_elements(),
num_outputs=num_classes)
print("Fully connected layer results : ", fc_result2)
return fc_result2 # Returning the result of the last FC layer.
训练CNN
构建完CNN的计算图后,接下来是使用之前准备好的训练数据对其进行训练。训练代码如下:
import tensorflow
import numpy
# Number of classes in the dataset. Used to specify the number of outputs in the last fully connected layer.
num_dataset_classes = 10
# Number of rows & columns in each input image. The image is expected to be rectangular Used to reshape the images and specify the input tensor shape.
im_dim = 32
# Number of channels in each input image. Used to reshape the images and specify the input tensor shape.
num_channels = 3
# Directory at which the training binary files of the CIFAR10 dataset are saved.
patches_dir = "\\AhmedGad\\cifar-10-python\\cifar-10-batches-py\\"
# Reading the CIFAR10 training binary files and returning the input data and output labels. Output labels are used to test the CNN prediction accuracy.
dataset_array, dataset_labels = get_dataset_images(dataset_path=patches_dir, im_dim=im_dim, num_channels=num_channels)
print("Size of data : ", dataset_array.shape)
# Input tensor to hold the data read in the preceding. It is the entry point of the computational graph.
# The given name of 'data_tensor' is useful for retrieving it when restoring the trained model graph for testing.
data_tensor = tensorflow.placeholder(tensorflow.float32, shape=[None, im_dim, im_dim, num_channels], name='data_tensor')
# Tensor to hold the outputs label.
# The name "label_tensor" is used for accessing the tensor when testing the saved trained model after being restored.
label_tensor = tensorflow.placeholder(tensorflow.float32, shape=[None], name='label_tensor')
# The probability of dropping neurons in the dropout layer. It is given a name for accessing it later.
keep_prop = tensorflow.Variable(initial_value=0.5, name="keep_prop")
# Building the CNN architecture and returning the last layer which is the fully connected layer.
fc_result2 = create_CNN(input_data=data_tensor, num_classes=num_dataset_classes, keep_prop=keep_prop)
# Predictions propabilities of the CNN for each training sample.
# Each sample has a probability for each of the 10 classes in the dataset.
# Such a tensor is given a name for accessing it later.
softmax_propabilities = tensorflow.nn.softmax(fc_result2, name="softmax_probs")
# Predictions labels of the CNN for each training sample.
# The input sample is classified as the class of the highest probability.
# axis=1 indicates that maximum of values in the second axis is to be returned. This returns that maximum class probability of each sample.
softmax_predictions = tensorflow.argmax(softmax_propabilities, axis=1)
# Cross entropy of the CNN based on its calculated propabilities.
cross_entropy = tensorflow.nn.softmax_cross_entropy_with_logits(logits=tensorflow.reduce_max(input_tensor=softmax_propabilities, reduction_indices=[1]), labels=label_tensor)
# Summarizing the cross entropy into a single value (cost) to be minimized by the learning algorithm.
cost = tensorflow.reduce_mean(cross_entropy)
# Minimizing the network cost using the Gradient Descent optimizer with a learning rate is 0.01.
error = tensorflow.train.GradientDescentOptimizer(learning_rate=.01).minimize(cost)
# Creating a new TensorFlow Session to process the computational graph.
sess = tensorflow.Session()
# Writing summary of the graph to visualize it using TensorBoard.
tensorflow.summary.FileWriter(logdir="\\AhmedGad\\TensorBoard\\", graph=sess.graph)
# Initializing the variables of the graph.
sess.run(tensorflow.global_variables_initializer())
# Because it may be impossible to feed the complete data to the CNN on normal machines, it is recommended to split the data into a number of patches.
# A subset of the training samples is used to create each path. Samples for each path can be randomly selected.
num_patches = 5 # Number of patches
for patch_num in numpy.arange(num_patches):
print("Patch : ", str(patch_num))
percent = 80 # percent of samples to be included in each path.
# Getting the input-output data of the current path.
shuffled_data, shuffled_labels = get_patch(data=dataset_array, labels=dataset_labels, percent=percent)
# Data required for cnn operation. 1)Input Images, 2)Output Labels, and 3)Dropout probability
cnn_feed_dict = {data_tensor: shuffled_data,
label_tensor: shuffled_labels,
keep_prop: 0.5}
通过以上步骤,我们完成了使用TensorFlow进行非线性分类和CIFAR10图像识别的过程。这些示例展示了TensorFlow在构建和训练神经网络模型方面的强大能力。
TensorFlow在非线性分类与图像识别中的应用
训练CNN的详细步骤及要点
在上述训练CNN的代码中,有几个关键步骤和要点需要进一步说明:
- 数据准备
- 确定数据集路径
patches_dir,确保路径指向包含CIFAR10训练二进制文件的目录。 - 使用
get_dataset_images函数读取并解码训练数据,得到dataset_array和dataset_labels,分别存储图像数据和对应的标签。
- 确定数据集路径
- 定义占位符和变量
-
data_tensor:用于存储输入图像数据的占位符,形状为[None, im_dim, im_dim, num_channels],其中None表示可以接受任意数量的样本。 -
label_tensor:用于存储输出标签的占位符,形状为[None]。 -
keep_prop:表示在丢弃层中保留神经元的概率,初始值设为0.5。
-
- 构建CNN模型
- 调用
create_CNN函数,传入输入数据、类别数量和保留概率,构建CNN模型。该函数返回最后一个全连接层的结果fc_result2。
- 调用
- 计算预测概率和标签
-
softmax_propabilities:使用softmax函数计算每个训练样本属于各个类别的概率。 -
softmax_predictions:通过argmax函数找到每个样本概率最大的类别,作为预测标签。
-
- 计算损失函数
-
cross_entropy:使用softmax_cross_entropy_with_logits函数计算CNN的交叉熵。 -
cost:将交叉熵汇总为一个单一的值,作为要最小化的成本。
-
- 优化模型
- 使用梯度下降优化器(Gradient Descent Optimizer),学习率设为0.01,最小化网络成本。
- 会话和训练
- 创建一个TensorFlow会话
sess,并初始化所有变量。 - 将数据分成多个批次(
num_patches)进行训练,每个批次包含一定比例(percent)的样本。
- 创建一个TensorFlow会话
训练过程中的注意事项
- 数据批次处理 :由于普通机器可能无法一次性处理整个数据集,因此将数据分成多个批次进行训练是必要的。这样可以减少内存压力,提高训练效率。
- 学习率选择 :学习率是一个重要的超参数,它控制着模型在每次迭代中更新参数的步长。如果学习率过大,模型可能会跳过最优解;如果学习率过小,训练过程会变得非常缓慢。在本示例中,学习率设为0.01,但在实际应用中,可能需要通过实验来选择合适的学习率。
- 过拟合问题 :为了避免过拟合,我们在模型中使用了丢弃层(Dropout Layer)。丢弃层在训练过程中随机丢弃一部分神经元,从而减少模型对训练数据的依赖,提高模型的泛化能力。
模型评估和测试
在完成训练后,我们需要对模型进行评估和测试,以验证其性能。以下是一个简单的示例代码,用于对训练好的模型进行测试:
# 假设已经完成了上述训练过程
# 读取测试数据
test_patches_dir = "\\AhmedGad\\cifar-10-python\\cifar-10-batches-py\\test_batch"
test_dataset_array, test_dataset_labels = get_dataset_images(dataset_path=test_patches_dir, im_dim=im_dim, num_channels=num_channels)
# 进行预测
test_predictions = sess.run(softmax_predictions, feed_dict={data_tensor: test_dataset_array, keep_prop: 1.0})
# 计算准确率
correct_predictions = tensorflow.equal(test_predictions, test_dataset_labels)
accuracy = tensorflow.reduce_mean(tensorflow.cast(correct_predictions, tensorflow.float32))
test_accuracy = sess.run(accuracy)
print("Test Accuracy: ", test_accuracy)
总结
本文介绍了如何使用TensorFlow进行非线性分类和CIFAR10图像识别。通过构建人工神经网络(ANN)模拟异或门操作,展示了TensorFlow在处理非线性分类问题上的能力。同时,详细阐述了使用卷积神经网络(CNN)对CIFAR10数据集进行图像识别的过程,包括数据准备、模型构建、训练和测试等步骤。
TensorFlow提供了丰富的工具和函数,使得构建和训练神经网络模型变得更加简单和高效。在实际应用中,我们可以根据具体问题调整模型架构、超参数等,以获得更好的性能。
未来展望
随着深度学习技术的不断发展,TensorFlow也在不断更新和完善。未来,我们可以期待更多的改进和创新,例如:
1. 更高效的训练算法 :开发更高效的优化算法,减少训练时间和计算资源的消耗。
2. 更强大的模型架构 :设计更复杂、更强大的神经网络架构,提高模型的性能和泛化能力。
3. 更好的可视化工具 :提供更直观、更强大的可视化工具,帮助用户更好地理解和调试模型。
4. 跨平台支持 :支持更多的平台和设备,使得TensorFlow可以在不同的环境中更方便地使用。
总之,TensorFlow在深度学习领域具有广阔的应用前景,我们可以利用它来解决各种复杂的问题,推动人工智能技术的发展。
超级会员免费看

7576

被折叠的 条评论
为什么被折叠?



