《python深度学习》笔记---5.2-2、猫狗分类(图片数据处理)
一、总结
一句话总结:
【将训练数据中的猫狗头像分训练集、验证集、测试集分好】:其实就是将训练数据中的猫狗头像分训练集、验证集、测试集分好,简单一点来说就是图片的复制粘贴
1、python的os模块的路径拼接和创建目录?
路径拼接:os的path的join方法:train_dir = os.path.join(base_dir, 'train')
创建目录:os的makedir方法:os.mkdir(train_dir)
2、这句话的意思是什么:fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]?
'cat.{}.jpg'.format(i) 当i为0的时候,为cat.0.jpg
3、python将图片复制到另一个目录?
shutil的copyfile方法:shutil.copyfile(src, dst)
# 将前 1000 张猫的图像复制 到 train_cats_dir fnames = ['cat.{}.jpg'.format(i) for i in range(1000)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(train_cats_dir, fname) shutil.copyfile(src, dst) pass # print(fnames)
4、python计算文件夹中文件的数目?
os的listdir方法求长度:print('total training cat images:', len(os.listdir(train_cats_dir)))
二、内容在总结中
转自或参考:
注意:what is shutil
The shutil module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal. For operations on individual files, see also the os module.
1、将图像复制到训练、验证和测试的目录
In [5]:
# 保存较小数据集的目录
base_dir = 'E:\78_recorded_lesson\001_course_github\AI_dataSet\dogs-vs-cats\cats_and_dogs_small'
os.mkdir(base_dir)
In [6]:
# 分别对应划分后的训练、验证和测试的目录
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)
In [7]:
# 猫的训练图像目录
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
In [8]:
# 狗的训练图像目录
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)
In [9]:
# 猫的验证图像目录
validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)
# 狗的验证图像目录
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)
In [10]:
# 猫的测试图像目录
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
# 狗的测试图像目录
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)
注意:文件复制操作
=================================
'cat.{}.jpg'.format(i) 当i为0的时候,为cat.0.jpg
In [13]:
# 将前 1000 张猫的图像复制 到 train_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(train_cats_dir, fname)
shutil.copyfile(src, dst)
pass
# print(fnames)
In [14]:
# 将接下来500 张猫的图像复 制到 validation_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(validation_cats_dir, fname)
shutil.copyfile(src, dst)
In [12]:
import os, shutil
# 原始数据集解压目录的路径
original_dataset_dir = 'E:\78_recorded_lesson\001_course_github\AI_dataSet\dogs-vs-cats\kaggle_original_data\train'
In [15]:
# 将接下来的500 张猫的图像 复制到 test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(test_cats_dir, fname)
shutil.copyfile(src, dst)
In [16]:
# 将前 1000 张狗的图像复制 到 train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(train_dogs_dir, fname)
shutil.copyfile(src, dst)
pass
# 将接下来500 张狗的图像复 制到 validation_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(validation_dogs_dir, fname)
shutil.copyfile(src, dst)
# 将接下来500 张狗的图像复 制到 test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(test_dogs_dir, fname)
shutil.copyfile(src, dst)
注意:验证图像
In [17]:
# 我们来检查一下,看看每个分组(训练 / 验证 / 测试)中分别包含多少张图像。
print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))