紧接前文:
NVIDIA公司推出的GPU运行环境下的机器人仿真环境(NVIDIA Isaac Gym)的安装——强化学习的仿真训练环境
本文主要给出 NVIDIA Isaac Gym 在给出的pytorch下PPO算法下运行例子的运行命令例子:
下面就给出几个使用rlgpu文件下的reinforcement learning代码训练isaacgym环境的例子:
下面的例子使用的文件:/home/devil/isaacgym/python/rlgpu/train.py
rlgpu下面的train.py
使用help解释来查看NVIDIA给出的reinforcement leanring算法命令参数:
python train.py -h
RL Policy optional arguments: -h, --help show this help message and exit --sim_device SIM_DEVICE Physics Device in PyTorch-like syntax --pipeline PIPELINE Tensor API pipeline (cpu/gpu) --graphics_device_id GRAPHICS_DEVICE_ID Graphics Device ID --flex Use FleX for physics --physx Use PhysX for physics --num_threads NUM_THREADS Number of cores used by PhysX --subscenes SUBSCENES Number of PhysX subscenes to simulate in parallel --slices SLICES Number of client threads that process env slices --test Run trained policy, no training --play Run trained policy, the same as test, can be used only by rl_games RL library --resume RESUME Resume training or start testing from a checkpoint --checkpoint CHECKPOINT Path to the saved weights, only for rl_games RL library --headless Force display off at all times --horovod Use horovod for multi-gpu training, have effect only with rl_games RL library --task TASK Can be BallBalance, Cartpole, CartpoleYUp, Ant, Humanoid, Anymal, FrankaCabinet, Quadcopter, ShadowHand, Ingenuity --task_type TASK_TYPE Choose Python or C++ --rl_device RL_DEVICE Choose CPU or GPU device for inferencing policy network --logdir LOGDIR --experiment EXPERIMENT Experiment name. If used with --metadata flag an additional information about physics engine, sim device, pipeline and domain randomization will be added to the name --metadata Requires --experiment flag, adds physics engine, sim device, pipeline info and if domain randomization is used to the experiment name provided by user --cfg_train CFG_TRAIN --cfg_env CFG_ENV --num_envs NUM_ENVS Number of environments to create - override config file --episode_length EPISODE_LENGTH Episode length, by default is read from yaml config --seed SEED Random seed --max_iterations MAX_ITERATIONS Set a maximum number of training iterations --steps_num STEPS_NUM Set number of simulation steps per 1 PPO iteration. Supported only by rl_games. If not -1 overrides the config settings. --minibatch_size MINIBATCH_SIZE Set batch size for PPO optimization step. Supported only by rl_games. If not -1 overrides the config settings. --randomize Apply physics domain randomization --torch_deterministic Apply additional PyTorch settings for more deterministic behaviour
============================================
运行命令例子:
1. CPU上仿真,CPU上训练
在CPU上运行仿真环境,同时PPO深度强化学习算法在CPU上进行训练 #Simulation on CPU, training on CPU:
python train.py --task=ShadowHand --headless --sim_device=cpu --rl_device=cpu --physx --num_threads=24
2. CPU上仿真,GPU上训练
python train.py --task=ShadowHand --headless --sim_device=cpu --rl_device=cuda:0 --physx --num_threads=24
3. GPU上仿真,CPU上训练
python train.py --task=ShadowHand --headless --sim_device=cuda:0 --rl_device=cpu --physx --num_threads=24
4. GPU上仿真,GPU上训练
其中,在0号显卡仿真,在1号显卡训练:
python train.py --task=ShadowHand --headless --sim_device=cuda:0 --rl_device=cuda:1 --physx --num_threads=24
其中,在1号显卡仿真,在0号显卡训练:
python train.py --task=ShadowHand --headless --sim_device=cuda:1 --rl_device=cuda:0 --physx --num_threads=24
=============================================