os.walk|图片数据集

该函数的功能：遍历指定文件夹下的所有【路径】【文件夹】【文件名】

'''
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
参数：
top -- 是你所要遍历的目录的地址, 返回的是一个三元组(root,dirs,files)。
root 所指的是当前正在遍历的这个文件夹的本身的地址
dirs 是一个 list ，内容是该文件夹中所有的目录的名字(不包括子目录)
files 同样是 list , 内容是该文件夹中所有的文件(不包括子目录)
topdown --可选，为 True，则优先遍历 top 目录，否则优先遍历 top 的子目录(默认为开启)。如果 topdown 参数为 True，walk 会遍历top文件夹，与top 文件夹中每一个子目录。
onerror -- 可选，需要一个 callable 对象，当 walk 需要异常时，会调用。
followlinks -- 可选，如果为 True，则会遍历目录下的快捷方式(linux 下是软连接 symbolic link )实际所指的目录(默认关闭)，如果为 False，则优先遍历 top 的子目录。
'''

函数定义

#查看root的所有值【root代表当前遍历文件夹的路径】
for root,dirs,files in os.walk(".",topdown=True):
    print(os.getcwd())
    print(root)

'''
说明：topdown = True  从最上层开始遍历  得到当前文件夹下的所有文件夹

返回结果：

D:pythonTensorFlow1_data_input_create4.3    ##1.当前工作目录一直没有改变（脚本所在目录）
.                                                   ##遍历顶层文件夹【'.'代表当前工作目录【一层】】
D:pythonTensorFlow1_data_input_create4.3    ##1.当前工作目录一直没有改变（脚本所在目录）
.mnist_digits_images                               ##遍历到子文件【二层】
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images                             ##遍历到子文件夹【三层】，【三层】有10个文件夹，一次遍历
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images1
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images2
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images3
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images4
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images5
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images6
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images7
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images8
D:pythonTensorFlow1_data_input_create4.3
.mnist_digits_images9

'''

查看所有root

for root,dirs,files in os.walk(".",topdown=True):
    print(dirs)
#
'''
['mnist_digits_images']                              ###指定目录下，只有一个文件夹【二层】
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']   ###正在遍历的文件夹有10个文件夹【三层】
[]                                                   ### 文件夹0 中没有文件夹
[]                                                   ### 文件夹1 中没有文件夹
[]
[]
[]
[]
[]
[]
[]
[]
'''

查看所有dirs

for root,dirs,files in os.walk(".",topdown=True):
    print(files)

'''
['4.3_data_input_create.py', 'os模块.py', '配套知识点.py']    ###【一层】所有文件  
[]                                                                                  ###【二层】没有文件
['0.bmp', '1.bmp', '10.bmp', '100.bmp', '101.bmp', '102.bmp', '103.bmp', '104.bmp',
                              ###【三层】文件较多，只列举了文件夹0中的文件，文件夹1的文件类似

查看所有files

##文件名和路径组合成文件名【绝对路径】,路径分离出当前文件夹名
for (dirpath,dirsname,filesname) in os.walk('mnist_digits_images',topdown=True):
    for filename in filesname:
        filename_path = os.sep.join([dirpath,filename])
        print(filename_path)
        time.sleep(1)
        dir_name = dirpath.split('\')[-1]
        print(dir_name)
        time.sleep(12)
        '''
        第一次循环
        mnist_digits_images.bmp   ##文件的绝对路径
        0                             ##当前文件名
        第二次循环
        mnist_digits_images1.bmp
        0
        '''

文件绝对路径和提取遍历位置的文件名

##将字符串类型的文件名称，映射成数字类型
##去重排序:set()--无序不重复集合类型  sorted()排序，默认升序   list（）变成列表形式
lab = list(sorted(set(labelsnames)))
'''
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
'''
##映射成数字
labdict = dict(zip(lab,list(range(len(lab)))))
'''
{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9}

补充：
>>>a = [1,2,3]
>>> b = [4,5,6]
>>> c = [4,5,6,7,8]
>>> zipped = zip(a,b)     # 打包为元组的列表
[(1, 4), (2, 5), (3, 6)]
>>> zip(a,c)              # 元素个数与最短的列表一致
[(1, 4), (2, 5), (3, 6)]
>>> zip(*zipped)          # 与 zip 相反，*zipped 可理解为解压，返回二维矩阵式
[(1, 2, 3), (4, 5, 6)]
'''

labels = [labdict[i] for i in labelsnames]
'''
列表解析：通过遍历所有字符串类型的文件夹名称【'0','0',````['9']】,通过字典取值，获得数字类型的文件名[0,0,`````,9]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ``````9]每个数字都对应一张图片
'''

将字符串类型的文件名称，映射成数字类型

a = np.array(labelsnames)
'''
将列表形式转化成数组形式
['0' '0' '0' ... '9' '9' '9']
'''
b = shuffle(np.asarray(lfilenames),np.asarray(labels))
'''
from sklearn.utils import shuffle   乱序
[array(['mnist_digits_images\8\292.bmp',
       'mnist_digits_images\1\668.bmp',
       'mnist_digits_images\6\121.bmp', ...,
       'mnist_digits_images\7\821.bmp',
       'mnist_digits_images\6\308.bmp',
       'mnist_digits_images\7\286.bmp'], dtype='<U29'), array([8, 1, 6, ..., 7, 6, 7])]

'''

转换成数组，并且乱序

相关阅读:
Codeforces 1491 D. Zookeeper and The Infinite Zoo （二进制处理）
Codeforces 1479A. Searching Local Minimum（注意输入+二分）
Codeforces 1480B. The Great Hero（阅读模拟题，注意数据范围和攻击顺序）
Codeforces 1480A. Yet Another String Game (阅读理解题)
windows 10 开启全盘瞬间索引功能
 JetBrains CLion C++ IDE连接wsl2(Ubuntu)时，报错"Unable to establish SSL connection"解决方案
 WorkFlowy 的 MarkDown 预览方案解决
 git 学习完全学懂
 jeecgboot <j-popup
面试之类加载器
原文地址：https://www.cnblogs.com/liuhuacai/p/11552670.html