整个项目代码分为三部分:Generrate_Captcha:生成验证码图片(训练集、验证集和测试集);读取图片数据和标签(标签为图片文件名);cnn_model:卷积神经网络;driver:模型训练和评估。1.配置项classConfig(object):width=160#验证码图片宽度height=60#验证码图片高度char_num=4#验证码字符个数characters=range(10)#Number[0,9]test_folder='test'#测试集文件夹,下同train_folder='train'validation_folder='validation'tensorboard_folder='tensorboard'#tensorboardlog路径generate_num=(5000,500,500)#训练集,验证集和测试Numberofsetsalpha=1e-3#LearningrateEpoch=100#Trainingroundsbatch_size=64#batches数量keep_prob=0.5#dropoutratioprint_per_batch=20#多少次输出结果save_per_batch=20#多少次写入tensorboard2,生成验证码(类Generate)验证码图片示例:0478check_path():检查文件夹是否存在,不存在则创建。gen_captcha():生成验证码方法,写入前检查是否存在,存在则重新生成。3.读取数据(类ReadData)read_data():返回图像数组(numpy.array格式)和标签(即文件名);label2vec():将文件名转换为向量;示例:label='1327'label_vec=[0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0]load_data():加载所有图片文件夹,返回图片数组,标签和图片数量。4、定义模型(cnn_model)采用三层卷积,filter_size为5。为了避免过拟合,每层卷积后都进行dropout操作,最后将图像转化为矩阵。大体结构如下:模型结构5、训练&评估next_batch():迭代器,批量返回数据;feed_data():将数据“喂”给模型;x:图像数组;y:图像标签;keep_prob:辍学率;evaluate():模型评估,用于验证和测试集。run_model():training&evaluation6.目前效果经过4000次迭代,训练集准确率可达99%以上,测试集准确率为93%。还是有一点过拟合,不过现在模型是基于CPU训练的,完成一次训练大约需要4个小时,后续调整后会更新。训练图像:10000,验证图像:1000,测试图像:1000Epoch:1Step0,train_acc:7.42%,train_loss:1.43,val_acc:9.85%,val_loss:1.40,改进:*第20步,train_acc:12.50%,train_loss:0.46,val_acc:10.35%,val_loss:0.46,改进:*步骤40,train_acc:9.38%,train_loss:0.37,val_acc:10.10%,val_loss:0.37,改进:步骤60,train_acc:7.42%,train_loss:0.34,val_acc:10.25%,val_loss:0.34,improved:Step80,train_acc:7.81%,train_loss:0.33,val_acc:9.82%,val_loss:0.33,improved:Step100,train_acc:12.11%,train_loss:0.33,val_acc:10.00%,val_loss:0.33,改进:步骤120,train_acc:9.77%,train_loss:0.33,val_acc:10.07%,val_loss:0.33,改进:步骤140,train_acc:8.98%,train_loss:0.33,val_acc:10.40%,val_loss:0.33,改进:*Epoch:2Step160,train_acc:8.20%,train_loss:0.33,val_acc:10.52%,val_loss:0.33,改进:*...Epoch:51Step7860,train_acc:100.00%,train_loss:0.01,val_acc:92.37%,val_loss:0.08,改进:Step7880,train_acc:99.61%,train_loss:0.01,val_acc:0.0val:92.28%改进:步骤7900,train_acc:100.00%,train_loss:0.01,val_acc:92.42%,val_loss:0.08,改进:步骤7920,train_acc:100.00%,train_loss:0.00,val_acc:92.83%,val_loss:0.08,改进:步骤7940,train_acc:100.00%,train_loss:0.01,val_acc:92.77%,val_loss:0.08,改进:步骤7960,train_acc:100.00%,train_loss:0.01,val_acc:92.68%,val_loss:0.08,改进:步骤7980,train_acc:100.00%,train_loss:0.00,val_acc:92.63%,val_loss:0.09,improved:超过1000步无改善,自动停止....测试精度:93.00%,loss:0.087,Tensorboard会在每次训练前的Tensorboard路径下删除文件,否则趋势图会乱。Accurcyloss文渊网仅供学习使用。如有侵权,请联系删除。我的公众号【Python圈】里收集了优质的技术文章和经验总结。为了方便大家,我也整理了一套学习资料,免费提供给热爱Python的同学们!还有学习交流群,多交流问题才能进步更快~
