• DQN2013代码尝试复现版(存在各种实现问题及Bug,个人尝试复现版,没有follow价值)


    在网上找的各种dqn代码修改而成,只实现了基本功能,对各个游戏的适配性没有进行,代码中还存在各种bug,属于草稿品质,不适合fellow,只是一时学习之用而进行尝试的残次半成品。核心代码已实现,为DQN2013版本,后端用的是pytorch框架。

    代码分享在:

    https://gitee.com/devilmaycry812839668/bug-dqn2013

    该代码并不完善,没有follow价值,只为学习之用。

    ========================================

    从这个项目中还是学到了一些新的知识:

    从模块  finished_tester.py   中可以看到opencv的图片保存为视频的操作:

    from tqdm import trange
    import numpy as np
    import time
    import cv2
    import torch
    
    from config import args
    from utils import seed_device_config
    from agent import Agent
    from environment import ALE
    
    # Configuration for log format&file, seed, torch
    seed_device_config()
    args.render = True
    args.debug = True
    
    data_path = 'videos/'
    fps = 30 #0.03333s    fps=20  0.05s
    size = (160, 210)
    video = cv2.VideoWriter(f"{data_path+args.game}.avi", cv2.VideoWriter_fourcc(*'XVID'), fps, size)
    
    class ENV(ALE):
        def step(self, action):
            reward, done = 0, False
            frame_buffer = np.zeros((2, 84, 84), dtype=np.uint8)
    
            for t in range(self.action_repeat):
                reward += self.ale.act(self.actions[action])
                frame_buffer[t % 2] = self._resize_state()
                done = self.ale.game_over()
    
                lives = self.ale.lives()
                if lives < self.lives:
                    if lives > 0:
                        self.life_termination = True
                    done = True
                if done:
                    break
    
            observation = frame_buffer.max(0)
            self.state_buffer.append(observation)
    
            if reward != 0:
                print("reward: ", reward, "lives: ", lives, "  done: ", self.ale.game_over())
                reward = reward // abs(reward)
    
            return np.stack(self.state_buffer), reward, done
    
    
    env = ENV()
    
    # Agent
    agt = Agent()
    dqn = agt.dqn
    p = "/home/devil/pycharm_project_857/exp/breakout/440935/2022_01_31-14_12_34/"+"3200000_model.pth"
    dqn.load_state_dict(torch.load(p))
    
    T_rewards = []
    state, reward_sum, done = env.reset(), 0, False
    
    ts = 0.05
    if args.render:
        env.render()
        time.sleep(ts)
    
    for step in range(args.evaluation_steps):
    # for _ in trange(args.evaluation_steps):
        action = agt.act_e_greedy(state)
        state, reward, done = env.step(action)
        reward_sum += reward
    
        if args.render:
            env.render()
            img = env.ale.getScreenRGB()[:, :, ::-1]
            video.write(img)
            time.sleep(ts)
        if done:
            print("reward_sum: ", reward_sum, " steps: ", step,  "  lives: ", env.ale.lives())
            T_rewards.append(reward_sum)
            state, reward_sum, done = env.reset(), 0, False
    
            if args.render:
                env.render()
                img = env.ale.getScreenRGB()[:, :, ::-1]
                video.write(img)
                time.sleep(ts)
    
    avg_reward = sum(T_rewards) / max(1, len(T_rewards))
    print(avg_reward)
    
    if args.render:
        env.close()
    video.release()
    
    
    """
    import cv2
    import atari_py
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    ale = atari_py.ALEInterface()
    ale.loadROM(atari_py.get_game_path('breakout'))
    
    state = cv2.resize(
                ale.getScreenGrayscale(),
                (84, 110),
                interpolation=cv2.INTER_LINEAR)
    state = state[26:, :]
    plt.imshow(state)
    """
    View Code

    核心代码:

    import cv2
    
    
    data_path = 'videos/'
    fps = 30 #0.03333s    fps=20  0.05s
    size = (160, 210)
    video = cv2.VideoWriter(f"{data_path+"breakout"}.avi", cv2.VideoWriter_fourcc(*'XVID'), fps, size)
    
    
    # 记录画面帧
    img = env.ale.getScreenRGB()[:, :, ::-1]
    video.write(img)
    
    
    video.release()

    实现了将30帧画面压到一秒视频的方式来收集所有画面帧,最后得到的一个avi的视频。

    从  atari_env.py  模块中可以看到不使用gym模块而直接使用atari_py模块进行操作的方式:

    我们知道gym库在进行atari游戏仿真时其底层用的就atari_py库,换句话说gym相当于在atari_py库的基础上进行了包装,那么我们也是可以跨过gym库直接调用atari_py库的。

    import atari_py
    
    
    
    ale = atari_py.ALEInterface()
    ale.loadROM(atari_py.get_game_path(args.game))
    
    ale.setInt('random_seed', args.seed)
    ale.setInt('max_num_frames_per_episode',args.max_episode_length)
    ale.setFloat('repeat_action_probability', 0)  # 可变更的地方
    
    
    actions = ale.getMinimalActionSet()

    生成游戏环境对象:

    ale = atari_py.ALEInterface()

    加载游戏环境的ROM文件:

    ale.loadROM(atari_py.get_game_path(args.game))

    可以设置为:  

    ale.loadROM(atari_py.get_game_path("breakout"))

    设置环境的随机种子:

    ale.setInt('random_seed', args.seed)

    设置游戏一个episode内最多的帧数,这里的episode是指多个lives条件下的整个episode。

    ale.setInt('max_num_frames_per_episode',args.max_episode_length)

    设置仿真环境是否重复上一次的输入动作的概率:(如果该值不为0,那么仿真环境会以该概率重复上次的输入动作而忽略掉本次的动作输入)

    ale.setFloat('repeat_action_probability', 0)

    获取游戏环境的动作最小集合:

    actions = ale.getMinimalActionSet()

    打印当前环境下可以价值ROM的游戏集合:

    atari_py.list_games()

    ['tetris', 'lost_luggage', 'pitfall2', 'pong', 'koolaid', 'breakout', 'hero', 'jamesbond', 'alien', 'road_runner', 'tennis', 'beam_rider', 'entombed', 'freeway', 'double_dunk', 'seaquest', 'king_kong', 'backgammon', 'casino', 'tic_tac_toe_3d', 'mr_do', 'zaxxon', 'ice_hockey', 'frogger', 'gravitar', 'private_eye', 'centipede', 'video_cube', 'adventure', 'defender', 'combat', 'star_gunner', 'flag_capture', 'othello', 'donkey_kong', 'et', 'air_raid', 'battle_zone', 'haunted_house', 'bank_heist', 'klax', 'up_n_down', 'pooyan', 'fishing_derby', 'darkchambers', 'kangaroo', 'warlords', 'berzerk', 'kung_fu_master', 'earthworld', 'demon_attack', 'superman', 'venture', 'tutankham', 'enduro', 'yars_revenge', 'joust', 'robotank', 'journey_escape', 'time_pilot', 'atlantis', 'pitfall', 'video_chess', 'atlantis2', 'basic_math', 'crossbow', 'word_zapper', 'mario_bros', 'sir_lancelot', '__init__', 'hangman', 'laser_gates', 'asterix', 'carnival', 'turmoil', 'keystone_kapers', 'surround', 'amidar', 'krull', 'pacman', 'phoenix', 'qbert', 'space_invaders', 'riverraid', 'skiing', 'ms_pacman', 'kaboom', 'solaris', 'maze_craze', 'human_cannonball', 'blackjack', 'frostbite', 'space_war', 'boxing', 'name_this_game', 'gopher', 'elevator_action', 'crazy_climber', 'asteroids', 'video_pinball', 'assault', 'bowling', 'montezuma_revenge', 'trondead', 'wizard_of_wor', 'video_checkers', 'galaxian', 'miniature_golf', 'chopper_command']

    ==============================================

  • 相关阅读:
    Spring MVC 文件上传简单示例(form、ajax方式 )
    Spring MVC Theme(简单示例)
    查看Spring MVC 父容器和子容器的对象的实例
    Spring mvc i18n国际化的简单demo
    idea中使用JRebel插件
    简单使用logback日志框架
    EC20指令
    keil中的一些技巧
    XModem与YModem
    codeblocks与MINGW的配置
  • 原文地址:https://www.cnblogs.com/devilmaycry812839668/p/15859010.html
Copyright © 2020-2023  润新知