Abstract: Pre-trained language models, such as GPT, BERT, have revolutionized natural language processing tasks across various fields. However, the current multi-head self-attention mechanisms in ...