此处是AI challenger2018年的植物病害数据集
https://challenger.ai/dataset/pdd2018
解压后可以看到是一堆图片文件夹和json文件
由于paddle卷积神经网络是读取txt文件的(路径/标签)键值对,以下是数据集预处理过程
import json # 读取本地的数据集,2018年这个数据集给出了路径+类别的json,使用时修改下数据集的路径即可 def create_data_list(data_root_path): f = open(data_root_path+'_trainingset/AgriculturalDisease_train_annotations.json', 'r') content = f.read() a = json.loads(content) print(a[31717]) for x in range(0, 31717): with open("C:/Users/14997/Desktop/database/train.list", 'a') as f: f.write(data_root_path+"_trainingset/images/"+a[x]["image_id"] + " %d" % a[x]["disease_class"] + " ") f.close() h = open(data_root_path + '_validationset/AgriculturalDisease_validation_annotations.json', 'r') contents = h.read() b = json.loads(contents) print(b[4538]) for x in range(0, 31717): with open("C:/Users/14997/Desktop/database/test.list", 'a') as h: h.write(data_root_path+"_validationset/images/"+b[x]["image_id"] + " %d" % b[x]["disease_class"] + " ") h.close() create_data_list('C:/Users/14997/Desktop/database/AgriculturalDisease')
效果如下:
这样就生成了路径与标签的键值对
好,那么今天就先这样