本次目的是验证BioBERT在QA的效果。
A challenge on large-scale biomedical semantic indexing and question answering
http://bioasq.org/
http://participants-area.bioasq.org/Tasks/
Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M. R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artiéres, T., Ngomo, A. C. N., Heino, N., Gaussier, E., Barrio-Alvers, L., … Paliouras, G. (2015). An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16(1), 1–28. https://doi.org/10.1186/s12859-015-0564-6
模型:biobert_v1.0_pubmed_pmc
训练数据:QA.zip,是预处理的BioASQ-4/5/6b数据集,韩国大学 团队提供
测试数据:BioASQ-TaskB-testData.zip,比赛官方提供
注意需要换文件名称
python run_qa.py --do_train=True --do_predict=True --vocab_file=$BIOBERT_DIR/vocab.txt --bert_config_file=$BIOBERT_DIR/bert_config.json --init_checkpoint=$BIOBERT_DIR/biobert_model.ckpt --max_seq_length=384 --train_batch_size=12 --learning_rate=5e-6 --doc_stride=128 --num_train_epochs=5.0 --do_lower_case=False --train_file=$BIOASQ_DIR/BioASQ-train-4b.json --predict_file=$BIOASQ_DIR/BioASQ-test-4b-1.json --output_dir=QA_output/