false
zh-CN,zh-TW,en,fr,de,ja,ko,pt,es,th,vi
Catalog
2023 ACMG Annual Clinical Genetics Meeting Digital ...
Automated Identification and Indexing of Genes and ...
Automated Identification and Indexing of Genes and Variants using Bidirectional Encoder Representation from Transformers
Back to course
Pdf Summary
The preliminary results of a study on automated identification and indexing of genes and variants using Bidirectional Encoder Representations from Transformers (BERT) models are presented. The methodology involves annotating genetic notes using the INCEpTION annotator tool, curating the annotated files into a usable dataset, and training the SciBERT model for named entity recognition and relation extraction. Testing is carried out to generate results.<br /><br />The study focuses on identifying genes and variants in electronic medical records (EMRs) to improve the efficiency and accuracy of clinical research. The researchers aim to automate the process, which is currently time-consuming and error-prone when done manually. They propose using BERT models to identify genes and variants in EMRs, which can accelerate clinical research and provide tailored treatment plans for patients with rare diseases.<br /><br />The results of the study show that the precision, recall, and F-1 score for identifying gene entities are 0.901, 0.729, and 0.806, respectively. For BGR (Brain Gene Registry) gene-variant tuples, the precision, recall, and F-1 score are 0.775, 0.603, and 0.678. And for variants, the precision, recall, and F-1 score are 0.898, 0.828, and 0.862. These results demonstrate the effectiveness of the BERT models in identifying genes and variants in genetic notes.<br /><br />The study suggests some further directions, including fine-tuning BERT models with domain-specific vocabulary, increasing annotation dimensions, and comparing metrics on diverse datasets from multiple systems. Establishing context between identified entities is also highlighted as crucial in natural language processing tasks.<br /><br />The data used in the study consists of 3,100 patients who underwent Next Generation Sequencing (NGS), with 50,000 free text genetic notes downloaded from the electronic health record (EHR) system.<br /><br />Overall, the study aims to demonstrate the applicability of BERT models in automating the identification of genes and variants in EMRs and to provide a method for future research to optimize the model for different types of genetic notes and integrate the extracted information into downstream analyses. The work is supported by funding from the National Institutes of Health.
Asset Subtitle
Submitter Only - Prabhu Shankar, MD, MS; Presenting Author - Jayneel Vora, BTech; Co-Author - Suma P. Shankar, MD,PhD; Co-Author - Leonard Abbeduto, Dr, PhD; Co-Author - Abigail Higareda, BA;
Meta Tag
Databases
Co-Author
Suma P. Shankar, MD,PhD
Co-Author
Leonard Abbeduto, Dr, PhD
Co-Author
Abigail Higareda, BA
Presenting Author
Jayneel Vora, BTech
Submitter Only
Prabhu Shankar, MD, MS
Keywords
automated identification
genes
variants
BERT models
EMRs
clinical research
precision
recall
F-1 score
Next Generation Sequencing
© 2025 American College of Medical Genetics and Genomics. All rights reserved.
×