1. clone repo
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples/PyTorch/LanguageModeling/BERT
2. run docker
# proxy: add to Dockerfile
export http_proxy='xxx'
export https_proxy='xxx'
export ftp_proxy='xxx'
sudo bash scripts/docker/build.sh
sudo bash scripts/docker/launch.sh
3. create dataset (take long time)
Note: Ensure a complete Wikipedia download.
But if the download failed in LDDL, remove the output directory data/wikipedia/ and start over again.
/workspace/bert/data/create_datasets_from_start.sh
4. run bert-pretraining
bash scripts/run_pretraining.sh