19、TensorFlow在非线性分类与图像识别中的应用

最新推荐文章于 2025-09-12 09:52:46 发布

原创最新推荐文章于 2025-09-12 09:52:46 发布 · 76 阅读

0 GEO检测

深度学习与CNN在计算机视觉的应用专栏收录该内容

26 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

TensorFlow在非线性分类与图像识别中的应用

在机器学习和深度学习领域，TensorFlow是一个广泛使用的开源库，它提供了强大的工具来构建和训练各种神经网络模型。本文将介绍如何使用TensorFlow进行非线性分类以及对CIFAR10数据集进行图像识别。

非线性分类：模拟异或门操作

我们首先尝试构建一个人工神经网络（ANN）来模拟具有两个输入的异或（XOR）门操作。异或门的真值表如下：
| A | B | 输出 |
| — | — | — |
| 1 | 1 | 0 |
| 0 | 1 | 1 |
| 0 | 0 | 0 |
| 1 | 0 | 1 |

通过绘制图形可以明显看出，这些类别是非线性可分的，因此需要使用隐藏层。根据相关经验，一个包含两个神经元的单一隐藏层就足够了。

网络架构如下：隐藏层接收输入层的输入，根据其权重和偏置，两个激活函数将产生两个输出。隐藏层的输出将作为输出层的输入，输出层使用其激活函数产生输入样本的最终预期类别。

以下是模拟具有两个输入的异或门的ANN完整代码：

import tensorflow

# Preparing a placeholder for the training data inputs of shape (N, 3)
training_inputs = tensorflow.placeholder(shape=[4, 2], dtype=tensorflow.float32, name="Inputs")
# Preparing a placeholder for the training data outputs of shape (N, 1)
training_outputs = tensorflow.placeholder(shape=[4, 1], dtype=tensorflow.float32, name="Outputs")
# Initializing the weights of the hidden layer of shape (2, 2)
hidden_weights = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(2,2), name="HiddenRandomWeights"), dtype=tensorflow.float32, name="HiddenWeights")
# Initializing the bias of the hidden layer of shape (1,2)
hidden_bias = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(1,2), name="HiddenRandomBias"), dtype=tensorflow.float32, name="HiddenBias")
# Calculating the SOPs by multiplying the weights matrix of the hidden layer by the data inputs matrix
hidden_sop = tensorflow.matmul(a=training_inputs, b=hidden_weights, name="HiddenSOPs")
# Adding the bias to the SOPs of the hidden layer
hidden_sop_bias = tensorflow.add(x=hidden_sop, y=hidden_bias, name="HiddenAddBias")
# Sigmoid activation function of the hidden layer outputs
hidden_sigmoid = tensorflow.nn.sigmoid(x=hidden_sop_bias, name="HiddenSigmoid")
# Initializing the weights of the output layer of shape (2, 1)
output_weights = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(2,1), name="OutputRandomWeights"), dtype=tensorflow.float32, name="OutputWeights")
# Initializing the bias of the output layer of shape (1,1)
output_bias = tensorflow.Variable(initial_value=tensorflow.truncated_normal(shape=(1,1), name="OutputRandomBias"), dtype=tensorflow.float32, name="OutputBias")
# Calculating the SOPs by multiplying the weights matrix of the hidden layer by the outputs of the hidden layer
output_sop = tensorflow.matmul(a=hidden_sigmoid, b=output_weights, name="Output_SOPs")
# Adding the bias to the SOPs of the hidden layer
output_sop_bias = tensorflow.add(x=output_sop, y=output_bias, name="OutputAddBias")
# Sigmoid activation function of the output layer outputs. These are the predictions.
predictions = tensorflow.nn.sigmoid(x=output_sop_bias, name="OutputSigmoid")
# Calculating the difference (error) between the ANN predictions and the correct outputs
error = tensorflow.subtract(x=training_outputs, y=predictions, name="Error")
# Square error.
square_error = tensorflow.square(x=error, name="SquareError")
# Measuring the prediction error of the network after being trained
loss = tensorflow.reduce_sum(square_error, name="Loss")
# Minimizing the prediction error using gradient descent optimizer
train_optim = tensorflow.train.GradientDescentOptimizer(learning_rate=0.01, name="GradientDescent")
minimizer = train_optim.minimize(loss, name="Minimizer")
# Training data inputs of shape (4, 2)
training_inputs_data = [[1, 0],
                        [0, 1],
                        [0, 0],
                        [1, 1]]
# Training data desired outputs
training_outputs_data = [[1.0],
                         [1.0],
                         [0.0],
                         [0.0]]
# Creating a TensorFlow Session
with tensorflow.Session() as sess:
    writer = tensorflow.summary.FileWriter(logdir="\\AhmedGad\\TensorBoard\\", graph=sess.graph)
    # Initializing the TensorFlow Variables (weights and bias)
    init = tensorflow.global_variables_initializer()
    sess.run(init)
    # Training loop of the neural network
    for step in range(100000):
        print(sess.run(fetches=minimizer, feed_dict={training_inputs: training_inputs_data, training_outputs: training_outputs_data}))
    # Class scores of training data
    print("Expected Outputs for Train Data:\n", sess.run(fetches=[predictions, hidden_weights, output_weights, hidden_bias, output_bias], feed_dict={training_inputs: training_inputs_data}))
    writer.close()

训练过程完成后，样本被正确分类，预测输出如下：

[[0.96982265],
 [0.96998841],
 [0.0275135],
 [0.0380362]]

训练后网络的参数如下：
- 隐藏层权重: [–6.27943468, –4.30125761], [–6.38489389, –4.31706429]]
- 隐藏层偏置: [[–8.8601017], [8.70441246]]
- 输出层权重: [[2.49879336, 6.37831974]]
- 输出层偏置: [[–4.06760359]]

CIFAR10图像识别使用卷积神经网络（CNN）

接下来，我们将使用TensorFlow构建一个CNN来识别CIFAR10数据集中的图像。

准备训练数据

CIFAR10数据集的二进制数据可以从这里下载。该数据集有60,000张图像，分为训练集和测试集。训练数据包含五个二进制文件，每个文件有10,000张图像，图像为32×32×3的RGB格式。

为了解码二进制文件，我们创建了以下函数：

import pickle
import os
import numpy

def unpickle_patch(file):
    patch_bin_file = open(file, 'rb')  # Reading the binary file.
    patch_dict = pickle.load(patch_bin_file, encoding='bytes')  # Loading the details of the binary file into a dictionary.
    return patch_dict  # Returning the dictionary.

def get_dataset_images(dataset_path, im_dim=32, num_channels=3):
    num_files = 5  # Number of training binary files in the CIFAR10 dataset.
    images_per_file = 10000  # Number of samples within each binary file.
    files_names = os.listdir(dataset_path)  # Listing the binary files in the dataset path.
    dataset_array = numpy.zeros(shape=(num_files * images_per_file, im_dim, im_dim, num_channels))
    dataset_labels = numpy.zeros(shape=(num_files * images_per_file), dtype=numpy.uint8)
    index = 0  # Index variable to count number of training binary files being processed.
    for file_name in files_names:
        if file_name[0:len(file_name) - 1] == "data_batch_":
            print("Working on : ", file_name)
            data_dict = unpickle_patch(dataset_path + file_name)
            images_data = data_dict[b"data"]
            # Reshaping all samples in the current binary file to be of 32x32x3 shape.
            images_data_reshaped = numpy.reshape(images_data, newshape=(len(images_data), im_dim, im_dim, num_channels))
            # Appending the data of the current file after being reshaped.
            dataset_array[index * images_per_file:(index + 1) * images_per_file, :, :, :] = images_data_reshaped
            # Appending the labels of the current file.
            dataset_labels[index * images_per_file:(index + 1) * images_per_file] = data_dict[b"labels"]
            index = index + 1  # Incrementing the counter of the processed training files by 1 to accept new file.
    return dataset_array, dataset_labels  # Returning the training input data and output labels.

构建CNN

CNN的数据流图在 create_CNN 函数中创建，它创建了一个由卷积（conv）、ReLU、最大池化、丢弃和全连接（FC）层组成的堆栈。CNN的架构如下：

graph LR
    A[输入数据] --> B[卷积层1]
    B --> C[ReLU层1]
    C --> D[最大池化层1]
    D --> E[卷积层2]
    E --> F[ReLU层2]
    F --> G[最大池化层2]
    G --> H[卷积层3]
    H --> I[ReLU层3]
    I --> J[最大池化层3]
    J --> K[丢弃层]
    K --> L[全连接层1]
    L --> M[全连接层2]

以下是构建CNN的代码：

import tensorflow

def create_conv_layer(input_data, filter_size, num_filters):
    filters = tensorflow.Variable(tensorflow.truncated_normal(shape=(filter_size, filter_size, tensorflow.cast(input_data.shape[-1], dtype=tensorflow.int32), num_filters), stddev=0.05))
    conv_layer = tensorflow.nn.conv2d(input=input_data,
                                      filter=filters,
                                      strides=[1, 1, 1, 1],
                                      padding="VALID")
    return filters, conv_layer  # Returning the filters and the convolution layer result.

def dropout_flatten_layer(previous_layer, keep_prop):
    dropout = tensorflow.nn.dropout(x=previous_layer, keep_prob=keep_prop)
    num_features = dropout.get_shape()[1:].num_elements()
    layer = tensorflow.reshape(previous_layer, shape=(-1, num_features))  # Flattening the results.
    return layer

def fc_layer(flattened_layer, num_inputs, num_outputs):
    fc_weights = tensorflow.Variable(tensorflow.truncated_normal(shape=(num_inputs, num_outputs), stddev=0.05))
    fc_result1 = tensorflow.matmul(flattened_layer, fc_weights)
    return fc_result1  # Output of the FC layer (result of matrix multiplication).

def create_CNN(input_data, num_classes, keep_prop):
    filters1, conv_layer1 = create_conv_layer(input_data=input_data, filter_size=7, num_filters=4)
    relu_layer1 = tensorflow.nn.relu(conv_layer1)
    max_pooling_layer1 = tensorflow.nn.max_pool(value=relu_layer1,
                                                ksize=[1, 2, 2, 1],
                                                strides=[1, 1, 1, 1],
                                                padding="VALID")
    filters2, conv_layer2 = create_conv_layer(input_data=max_pooling_layer1, filter_size=5, num_filters=3)
    relu_layer2 = tensorflow.nn.relu(conv_layer2)
    max_pooling_layer2 = tensorflow.nn.max_pool(value=relu_layer2,
                                                ksize=[1, 2, 2, 1],
                                                strides=[1, 1, 1, 1],
                                                padding="VALID")
    filters3, conv_layer3 = create_conv_layer(input_data=max_pooling_layer2, filter_size=3, num_filters=2)
    relu_layer3 = tensorflow.nn.relu(conv_layer3)
    max_pooling_layer3 = tensorflow.nn.max_pool(value=relu_layer3,
                                                ksize=[1, 2, 2, 1],
                                                strides=[1, 1, 1, 1],
                                                padding="VALID")
    flattened_layer = dropout_flatten_layer(previous_layer=max_pooling_layer3, keep_prop=keep_prop)
    fc_result1 = fc_layer(flattened_layer=flattened_layer, num_inputs=flattened_layer.get_shape()[1:].num_elements(),
                          num_outputs=200)
    fc_result2 = fc_layer(flattened_layer=fc_result1, num_inputs=fc_result1.get_shape()[1:].num_elements(),
                          num_outputs=num_classes)
    print("Fully connected layer results : ", fc_result2)
    return fc_result2  # Returning the result of the last FC layer.

训练CNN

构建完CNN的计算图后，接下来是使用之前准备好的训练数据对其进行训练。训练代码如下：

import tensorflow
import numpy

# Number of classes in the dataset. Used to specify the number of outputs in the last fully connected layer.
num_dataset_classes = 10
# Number of rows & columns in each input image. The image is expected to be rectangular Used to reshape the images and specify the input tensor shape.
im_dim = 32
# Number of channels in each input image. Used to reshape the images and specify the input tensor shape.
num_channels = 3
# Directory at which the training binary files of the CIFAR10 dataset are saved.
patches_dir = "\\AhmedGad\\cifar-10-python\\cifar-10-batches-py\\"
# Reading the CIFAR10 training binary files and returning the input data and output labels. Output labels are used to test the CNN prediction accuracy.
dataset_array, dataset_labels = get_dataset_images(dataset_path=patches_dir, im_dim=im_dim, num_channels=num_channels)
print("Size of data : ", dataset_array.shape)
# Input tensor to hold the data read in the preceding. It is the entry point of the computational graph.
# The given name of 'data_tensor' is useful for retrieving it when restoring the trained model graph for testing.
data_tensor = tensorflow.placeholder(tensorflow.float32, shape=[None, im_dim, im_dim, num_channels], name='data_tensor')
# Tensor to hold the outputs label.
# The name "label_tensor" is used for accessing the tensor when testing the saved trained model after being restored.
label_tensor = tensorflow.placeholder(tensorflow.float32, shape=[None], name='label_tensor')
# The probability of dropping neurons in the dropout layer. It is given a name for accessing it later.
keep_prop = tensorflow.Variable(initial_value=0.5, name="keep_prop")
# Building the CNN architecture and returning the last layer which is the fully connected layer.
fc_result2 = create_CNN(input_data=data_tensor, num_classes=num_dataset_classes, keep_prop=keep_prop)
# Predictions propabilities of the CNN for each training sample.
# Each sample has a probability for each of the 10 classes in the dataset.
# Such a tensor is given a name for accessing it later.
softmax_propabilities = tensorflow.nn.softmax(fc_result2, name="softmax_probs")
# Predictions labels of the CNN for each training sample.
# The input sample is classified as the class of the highest probability.
# axis=1 indicates that maximum of values in the second axis is to be returned. This returns that maximum class probability of each sample.
softmax_predictions = tensorflow.argmax(softmax_propabilities, axis=1)
# Cross entropy of the CNN based on its calculated propabilities.
cross_entropy = tensorflow.nn.softmax_cross_entropy_with_logits(logits=tensorflow.reduce_max(input_tensor=softmax_propabilities, reduction_indices=[1]), labels=label_tensor)
# Summarizing the cross entropy into a single value (cost) to be minimized by the learning algorithm.
cost = tensorflow.reduce_mean(cross_entropy)
# Minimizing the network cost using the Gradient Descent optimizer with a learning rate is 0.01.
error = tensorflow.train.GradientDescentOptimizer(learning_rate=.01).minimize(cost)
# Creating a new TensorFlow Session to process the computational graph.
sess = tensorflow.Session()
# Writing summary of the graph to visualize it using TensorBoard.
tensorflow.summary.FileWriter(logdir="\\AhmedGad\\TensorBoard\\", graph=sess.graph)
# Initializing the variables of the graph.
sess.run(tensorflow.global_variables_initializer())
# Because it may be impossible to feed the complete data to the CNN on normal machines, it is recommended to split the data into a number of patches.
# A subset of the training samples is used to create each path. Samples for each path can be randomly selected.
num_patches = 5  # Number of patches
for patch_num in numpy.arange(num_patches):
    print("Patch : ", str(patch_num))
    percent = 80  # percent of samples to be included in each path.
    # Getting the input-output data of the current path.
    shuffled_data, shuffled_labels = get_patch(data=dataset_array, labels=dataset_labels, percent=percent)
    # Data required for cnn operation. 1)Input Images, 2)Output Labels, and 3)Dropout probability
    cnn_feed_dict = {data_tensor: shuffled_data,
                     label_tensor: shuffled_labels,
                     keep_prop: 0.5}

通过以上步骤，我们完成了使用TensorFlow进行非线性分类和CIFAR10图像识别的过程。这些示例展示了TensorFlow在构建和训练神经网络模型方面的强大能力。

TensorFlow在非线性分类与图像识别中的应用

训练CNN的详细步骤及要点

在上述训练CNN的代码中，有几个关键步骤和要点需要进一步说明：

数据准备
- 确定数据集路径 patches_dir ，确保路径指向包含CIFAR10训练二进制文件的目录。
- 使用 get_dataset_images 函数读取并解码训练数据，得到 dataset_array 和 dataset_labels ，分别存储图像数据和对应的标签。
定义占位符和变量
- data_tensor ：用于存储输入图像数据的占位符，形状为 [None, im_dim, im_dim, num_channels] ，其中 None 表示可以接受任意数量的样本。
- label_tensor ：用于存储输出标签的占位符，形状为 [None] 。
- keep_prop ：表示在丢弃层中保留神经元的概率，初始值设为0.5。
构建CNN模型
- 调用 create_CNN 函数，传入输入数据、类别数量和保留概率，构建CNN模型。该函数返回最后一个全连接层的结果 fc_result2 。
计算预测概率和标签
- softmax_propabilities ：使用 softmax 函数计算每个训练样本属于各个类别的概率。
- softmax_predictions ：通过 argmax 函数找到每个样本概率最大的类别，作为预测标签。
计算损失函数
- cross_entropy ：使用 softmax_cross_entropy_with_logits 函数计算CNN的交叉熵。
- cost ：将交叉熵汇总为一个单一的值，作为要最小化的成本。
优化模型
- 使用梯度下降优化器（Gradient Descent Optimizer），学习率设为0.01，最小化网络成本。
会话和训练
- 创建一个TensorFlow会话 sess ，并初始化所有变量。
- 将数据分成多个批次（ num_patches ）进行训练，每个批次包含一定比例（ percent ）的样本。

训练过程中的注意事项

数据批次处理 ：由于普通机器可能无法一次性处理整个数据集，因此将数据分成多个批次进行训练是必要的。这样可以减少内存压力，提高训练效率。
学习率选择 ：学习率是一个重要的超参数，它控制着模型在每次迭代中更新参数的步长。如果学习率过大，模型可能会跳过最优解；如果学习率过小，训练过程会变得非常缓慢。在本示例中，学习率设为0.01，但在实际应用中，可能需要通过实验来选择合适的学习率。
过拟合问题 ：为了避免过拟合，我们在模型中使用了丢弃层（Dropout Layer）。丢弃层在训练过程中随机丢弃一部分神经元，从而减少模型对训练数据的依赖，提高模型的泛化能力。

模型评估和测试

在完成训练后，我们需要对模型进行评估和测试，以验证其性能。以下是一个简单的示例代码，用于对训练好的模型进行测试：

# 假设已经完成了上述训练过程
# 读取测试数据
test_patches_dir = "\\AhmedGad\\cifar-10-python\\cifar-10-batches-py\\test_batch"
test_dataset_array, test_dataset_labels = get_dataset_images(dataset_path=test_patches_dir, im_dim=im_dim, num_channels=num_channels)

# 进行预测
test_predictions = sess.run(softmax_predictions, feed_dict={data_tensor: test_dataset_array, keep_prop: 1.0})

# 计算准确率
correct_predictions = tensorflow.equal(test_predictions, test_dataset_labels)
accuracy = tensorflow.reduce_mean(tensorflow.cast(correct_predictions, tensorflow.float32))
test_accuracy = sess.run(accuracy)

print("Test Accuracy: ", test_accuracy)

总结

本文介绍了如何使用TensorFlow进行非线性分类和CIFAR10图像识别。通过构建人工神经网络（ANN）模拟异或门操作，展示了TensorFlow在处理非线性分类问题上的能力。同时，详细阐述了使用卷积神经网络（CNN）对CIFAR10数据集进行图像识别的过程，包括数据准备、模型构建、训练和测试等步骤。

TensorFlow提供了丰富的工具和函数，使得构建和训练神经网络模型变得更加简单和高效。在实际应用中，我们可以根据具体问题调整模型架构、超参数等，以获得更好的性能。

未来展望

随着深度学习技术的不断发展，TensorFlow也在不断更新和完善。未来，我们可以期待更多的改进和创新，例如：
1. 更高效的训练算法 ：开发更高效的优化算法，减少训练时间和计算资源的消耗。
2. 更强大的模型架构 ：设计更复杂、更强大的神经网络架构，提高模型的性能和泛化能力。
3. 更好的可视化工具 ：提供更直观、更强大的可视化工具，帮助用户更好地理解和调试模型。
4. 跨平台支持 ：支持更多的平台和设备，使得TensorFlow可以在不同的环境中更方便地使用。

总之，TensorFlow在深度学习领域具有广阔的应用前景，我们可以利用它来解决各种复杂的问题，推动人工智能技术的发展。

标签