With recent technological breakthroughs, researchers have begun to employ several machine learning techniques on the wealth of available biomedical data. The use of techniques such as text mining and knowledge extraction of the biomedical literature has proven essential for the development of new drugs, clinical treatments, pathology research, and more. With ongoing scientific advances, more and more biomedical publications are published every day. Derive meaningful information from this material. This is where pre-trained language models come into play. Biomedical researchers have shown great interest in pre-trained language models due to their superior effectiveness in the general natural language domain.
However, the performance of these models when used directly in the biomedical field is not satisfactory. The model excels at a variety of discriminatory downstream biological tasks, but its lack of generational capacity limits its applicability. To combat this problem, past researchers pretrained models on biomedical texts. His two main branches of pre-trained language models for the general language domain, GPT and BERT, and their variants. BERT has received the most attention in the biomedical field. BioBERT and PubMedBERT are two of the most popular pre-trained language models in the biomedical industry, achieving superior performance compared to other popular pre-trained models for biomedical texts.
However, the majority of current research utilizes BERT models that are better suited for comprehension tasks compared to generative tasks. Although the GPT model has proven to be adept at generating tasks, its performance in the biomedical field has not yet been fully scrutinized. In response to statements on this issue, Microsoft researchers recently introduced BioGPT. This is a domain-specific generative Transformer language model pre-trained on extensive biomedical literature. BioGPT is pre-trained on a massive corpus of 15 million PubMed abstracts and built on the Transformer language model. Researchers evaluated language models using six biological NLP tasks. Among them are question answering, document classification and end-to-end relationship extraction. Several experimental evaluations show that BioGPT significantly outperforms alternative baseline models in most tasks.
A high-quality dataset is very important for pre-training a language model. Researchers pretrained the model from scratch using in-domain text data from PubMed. His GPT-2 model, which is basically a Transformer decoder, serves as the foundation for BioGPT. However, rather than using the GPT-2 vocabulary, the researchers focused on learning the vocabulary of the collected intra-domain corpus using byte-pair encoding. A key component of the BioGPT model is a multi-head attention layer that produces queries Q, keys K, and values V after three linear transformations. These are then used to compute the output of the multi-head attention layer, which is then sent to the feedforward layer to create the Transformer block.
The pre-trained model was fine-tuned to suit downstream tasks such as text generation, question answering, and end-to-end relationship extraction. All these activities have the same input type, that is, the sequence, but different output formats. Therefore, when applying the pre-trained His BioGPT to these tasks, the researchers carefully examined the format of the prompt and target His sequences. BioGPT achieves state-of-the-art performance on three end-to-end relationship extraction tasks and one question answering task. Moreover, in terms of biomedical text generation skills, he outperforms GPT-2 in text generation tasks. The Microsoft research team plans to train BioGPT on even larger biomedical data in the future to adapt it for additional downstream activities. A basic implementation of BioGPT can be found below.
check out paper When github. All credit for this research goes to the researchers of this project.Also, don’t forget to participate Our 13k+ ML SubReddit, cacophony channelWhen email newsletterWe share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Goa. She has her passions in the fields of machine learning, natural language processing, and her web development. She enjoys learning more about the technical field by participating in some challenges.