在本教程中,我们将使用TensorFlow(KerasAPI)实现一个深度学习模型,用于需要阿拉伯手写字符数据集进行识别的多分类任务。数据集下载地址:https://www.kaggle.com/mloey1/ahcd1数据集介绍本数据集由60位参与者书写的16800个字符组成,年龄范围在19-40岁之间,90%的参与者是正确的手。每个参与者在两种表格上将每个字符(从“alef”到“yeh”)写十次,如图7(a)和7(b)所示。表格以300dpi扫描。使用Matlab2016a自动分割每个块以确定每个块的坐标。数据库分为两组:训练集(每类13,440个字符到480张图像)和测试集(每类3,360字符到120张图像)。数据标签有1到28个类别。这里,所有数据集都是CSV文件,代表图像像素值及其对应的标签,没有提供对应的图像数据。importmoduleimportnumpyasnpimportpandasaspd#允许display()来自IPython.displayimportdisplay#导入读取和处理图像所需的库从scipy.ndimageimportrotatereaddata从PILimportImage导入csv#训练数据imagesletters_training_images_file_path="../input/ahcd1/csvTrainImages13440x1024"训练.数据labelletters_training_labels_file_path="../input/ahcd1/csvTrainLabel13440x1.csv"#测试数据图片和labelletters_testing_images_file_path="../input/ahcd1/csvTestImages3360x1024.csv"letters_testing_labels_vLabelx1?1?1.csvcsv"#加载数据training_letters_images=pd.read_csv(letters_training_images_file_path,header=None)training_letters_labels=pd.read_csv(letters_training_labels_file_path,header=None)testing_letters_images=pd.read_csv(letters_testing_images_file_path,header=None)testing_letters_labels=pd.read_csv(letters_testing_labels_file_path,header=None)print("%d32x32像素的训练阿拉伯字母图像。"%training_letters_images.shape[0])print("%d32x32像素测试阿拉伯字母图像。"%testing_letters_images.shape[0])training_letters_images.head()1344032x32像素训练阿拉伯字母图像。3360一个32x32像素测试阿拉伯字母图像。查看headnp.unique(training_letters_labels)array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28],dtype=int32)下面需要将csv值转成图片,我们要显示对应图片的像素值image.defconvert_values_to_image(image_values,display=False):image_array=np.asarray(image_values)image_array=image_array.reshape(32,32).astype('uint8')#反映原始数据集,所以我们将使用np.翻转它,然后通过rotate旋转得到更好的image.image_array=np.flip(image_array,0)image_array=rotate(image_array,-90)new_image=Image.fromarray(image_array)ifdisplay==True:new_image。show()返回new_imageconvert_values_to_image(training_letters_images.loc[0],True)这是一个字母f。接下来,我们将进行数据预处理,主要是图像归一化,我们通过将图像中的每个像素除以255来重新缩放图像,归一化为[0,1]training_letters_images_scaled=training_letters_images.values.astype('float32')/255training_letters_labels=training_letters_labels。values.astype('int32')testing_letters_images_scaled=testing_letters_images.values.astype('float32')/255testing_letters_labels=testing_letters_labels.values.astype('int32')print("Trainingimagesoflettersafterscaling")打印(training_letters_images_scaled.shape)training_letters_scaled[0:images]输出如下Trainingimagesoflettersafterscaling(13440,1024)从标签csv文件我们可以看出这是一个多类分类问题。下一步需要进行分类标签编码,建议将类别向量转化为矩阵类型。输出形式如下:将1~28转化为0~27类。从“alef”到“yeh”的字母有0到27的分类号。to_categorical是将类别向量转换成二进制(只有0和1)矩阵类型表示。这里,我们将使用keras的onehotencoding对这些类别值进行编码。一种热编码将整数转换为二进制矩阵,其中数组仅包含一个“1”,其余元素为“0”。fromkeras.utilsimportto_categorical#onehotencodingnumber_of_classes=28training_letters_labels_encoded=to_categorical(training_letters_labels-1,num_classes=number_of_classes)testing_letters_labels_encoded=to_categorical(testing_letters_labels-1,num_classes=number_of_classes)#(13440,1024)下面将输入图像重塑为32x32x1,因为当使用TensorFlow作为在后端,KerasCNN需要一个4维数组作为输入,shape为(nb_samples,rows,columns,channels)其中nb_samples对应图像(或样本)的总数,rows,columns和channels分别对应每张图像的数量行、列和通道。#reshapeinputletterimagesto32x32x1training_letters_images_scaled=training_letters_images_scaled.reshape([-1,32,32,1])testing_letters_images_scaled=testing_letters_images_scaled.reshape([-1,32,32,1])print(training_letters_images_scaled.shape,training_letters_labels_encoded.shape,testing_letters_images_scaled.shape,testing_letters_labels_encoded.shape)#(13440,32,32,1)(13440,28)(3360,32,32,1)(3360,28)因此,我们将输入图像重塑为4D体积形状(nb_samples,32,32,1),因为我们的图像是32x32像素的灰度图。#将输入字母图像重塑为32x32x1training_letters_images_scaled=training_letters_images_scaled.reshape([-1,32,32,1])testing_letters_images_scaled=testing_letters_images_scaled.reshape([-1,32,32,1])print(training_letters_images_scaled.shape,training_letters_labels_encoded.shape,testing_letters_images_scaled.shape,testing_letters_labels_encoded.shape)设计模型结构fromkeras.modelsimportSequentialfromkeras.layersimportConv2D,MaxPooling2D,GlobalAveragePooling2D,BatchNormalization,Dropout,Densedefcreate_model(optimizer='adam',activitialiter',=kernel_inmodel')=Sequential()model.add(Conv2D(filters=16,kernel_size=3,padding='same',input_shape=(32,32,1),kernel_initializer=kernel_initializer,activation=activation))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=2))model.add(Dropout(0.2))model.add(Conv2D(filters=32,kernel_size=3,padding='same',kernel_initializer=kernel_initializer,activation=activation))model.add(批量归一化())model.add(MaxPooling2D(pool_size=2))model.add(Dropout(0.2))model.add(Conv2D(filters=64,kernel_size=3,padding='same',kernel_initializer=kernel_initializer,activation=activation))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=2))model.add(Dropout(0.2))model.add(Conv2D(filters=128,kernel_size=3,padding='same',kernel_initializer=kernel_initializer,activation=activation))model.add(BatchNormalization())model.add(MaxPooling2D(pool_size=2))model.add(Dropout(0.2))model.add(GlobalAveragePooling2D())#Fullyconnectedfinallayermodel.add(Dense(28,activation='softmax'))#Compilemodelmodel.compile(loss='categorical_crossentropy',metrics=['accuracy'],optimizer=optimizer)returnmodel"modelstructure"第一个隐藏层是卷积层,它有16个大小为3×3的特征图和一个激活函数,即relu。这是输入层,它需要具有上述结构的图像。第二层为batchnormalization层,解决训练和测试数据中特征分布的变化,在激活函数之前加入BN层,对输入激活函数进行归一化处理。这解决了输入数据偏移和增加的影响。第三层是MaxPooling层。最大池化层用于对输入进行下采样,使模型能够对特征做出假设,从而减少过度拟合。它还减少了参数的学习次数并减少了训练时间。下一层是使用dropout的正则化层。它被配置为随机排除层中20%的神经元以减少过度拟合。另一个隐藏层包含32个大小为3×3的特征和relu激活函数以从图像中捕获更多特征。其他隐藏层包含64和128个特征,大小为3×3,relu激活函数对卷积层、MaxPooling、批量归一化、正则化和*GlobalAveragePooling2D层重复三次。最后一层是具有(输出类的数量)的输出层,它使用softmax激活函数,因为我们有多个类。每个神经元都会给出该类别的概率。使用分类交叉熵作为损失函数,因为它是一个多类分类问题。使用准确性作为提高神经网络性能的指标。model=create_model(optimizer='Adam',kernel_initializer='uniform',activation='relu')model.summary()"Keras在Keras.utils.vis_utils模块中支持绘制模型,提供使用graphviz绘制Kerasmodels实用函数“importpydotfromkeras.utilsimportplot_modelplot_model(model,to_file="model.png",show_shapes=True)fromIPython.displayimportImageasIPythonImagedisplay(IPythonImage('model.png'))训练模型,使用batch_size=20训练模型,执行15个时代的训练。fromkeras.callbacksimportModelCheckpoint#使用检查点保存模型权重以备后用。checkpointer=ModelCheckpoint(filepath='weights.hdf5',verbose=1,save_best_only=True)history=model.fit(training_letters_images_scaled,training_letters_labels_encoded,validation_data=(testing_letters_images_scaled,testing_letters_labels_encoded,2=bosepoch=0回调=[检查点])结果如下:最后,Epochs绘制了loss和accuracy曲线。importmatplotlib.pyplotaspltdefplot_loss_accuracy(history):#Lossplt.figure(figsize=[8,6])plt.plot(history.history['loss'],'r',linewidth=3.0)plt.plot(history.history['val_loss'],'b',linewidth=3.0)plt.legend(['Trainingloss','ValidationLoss'],fontsize=18)plt.xlabel('Epochs',fontsize=16)plt.ylabel('Loss',fontsize=16)plt.title('LossCurves',fontsize=16)#Accuracyplt.figure(figsize=[8,6])plt.plot(history.history['accuracy'],'r',linewidth=3.0)plt.plot(history.history['val_accuracy'],'b',linewidth=3.0)plt.legend(['TrainingAccuracy','ValidationAccuracy'],fontsize=18)plt.xlabel('Epochs',fontsize=16)plt.ylabel('Accuracy',fontsize=16)plt.title('AccuracyCurves',fontsize=16)plot_loss_accuracy(history)"Loadthemodelwiththebestvalidationloss"#加载验证损失最好的model模型。load_weights('weights.hdf5')metrics=model.evaluate(testing_letters_images_scaled,testing_letters_labels_encoded,verbose=1)print("TestAccuracy:{}".format(metrics[1]))print("TestLoss:{}".format(metrics[0]))输出如下:3360/3360[=============================]-0s87us/stepTestAccuracy:0.9678571224212646TestLoss:0.11759862171020359打印混淆矩阵fromsklearn.metricsimportclassification_reportdefget_predicted_classes(模型,数据,标签=无):image_predictions=model.predict(数据)predicted_classes=np.argmax(image_predictions,axis=1)true_classes=np.argmax(labels,axis=1)returnpredicted_classes,true_classes,image_predictionsdefget_classification_report(y_true,y_pred):print(classification_report(y_true,y_pred))y_pred,y_true,image_predictions=get_predicted_classes(model,testing_letters_images_scaled,testing_letters_labels_encoded)get_classification_report(y_true,y_pred)输出如下:precisionrecallf1-scoresupport01.000.980.9912011.000.980.9912020.800.980.8812030.980.880.9312040.990.970.9812050.920.990.9612060.940.970.9512070.940.950.9512080.960.880.9212090.901.000.94120100.940.900.92120110.981.000.99120120.990.980.99120130.960.970.97120141.000.930.97120150.940.990.97120161.000.930.96120170.970.970.97120181.000.930.96120190.920.950.93120200.970.930.94120210.990.960.97120220.990.980.99120230.980.990.99120240.950.880.91120250.940.980.96120260.950.970.970.96120270.980.980.990.99120ACCURACY0.963360MACROAVG0.960.960.960.960.963360GHETEVGG0.960.960.960.960.960.960.960.960.9660.9660360最后(0,testing_letters_labels.shape[0],size=49)y_pred=np.argmax(model.predict(training_letters_images_scaled),axis=1)fori,idxinenumerate(indices):plt.subplot(7,7,i+1)image_array=training_letters_images_scaled[idx][:,:,0]image_array=np.flip(image_array,0)image_array=rotate(image_array,-90)plt.imshow(image_array,cmap='gray')plt.title("Pred:{}-标签:{}".format(y_pred[idx],(training_letters_labels[idx]-1)))plt.xticks([])plt.yticks([])plt.show()
