首先是Bert的论文和 attention is all you need的论文
然后是:将nlp预训练 迁移学习的发展从word2vec 到elmo bert
https://mp.weixin.qq.com/s/Rd3-ypRYiJObi-e2JDeOjQ
https://mp.weixin.qq.com/s/7imMQ3GkD52xP7N4fqNPog
讲解transformer:
http://nlp.seas.harvard.edu/2018/04/03/attention.html
https://jalammar.github.io/illustrated-transformer/
关于bert模型代码的解析:
https://blog.csdn.net/weixin_39470744/article/details/84401339
bert做多标签分类:
https://github.com/brightmart/sentiment_analysis_fine_grain/blob/master/run_classifier_multi_labels_bert.py