false
zh-CN,zh-TW,en,fr,de,ja,ko,pt,es,th,vi
Catalog
2023 ACMG Annual Clinical Genetics Meeting Digital ...
Anonymous is Synonymous: Utilizing De-identified G ...
Anonymous is Synonymous: Utilizing De-identified Genetic and Clinical Information to Evaluate Diagnostic Prioritization Algorithms
Back to course
Pdf Summary
The study explores the use of de-identified patient data to validate diagnostic prioritization algorithms for genetic diagnosis. The cohort consisted of 93 patients diagnosed by exome or genome sequencing. De-identified variant call format (VCF) files were created for each patient by inserting their causative variant(s) into publicly available VCFs. A note containing phenotype information for each patient was selected and analyzed to extract phenotypes. The original and de-identified sequencing data was referred to as IDSeq and AnonSeq, respectively, and the original and de-identified notes were referred to as IDNote and DeIDNote, respectively.<br /><br />The diagnostic variant prioritization algorithm CAVaLRi was used with all combinations of identified and de-identified information sources. The average diagnostic variant rank (ADR) was calculated for each combination. The results showed that de-identifying notes did not affect ranking for original sequencing data, but using de-identified VCFs did affect the rank. The difference in ADR between IDSeq-IDNote and AnonSeq-DeIDNote was 3.14 ranks.<br /><br />The study concludes that de-identifying the genetic representation significantly affected the ADR, but the note de-identification procedure did not. If these effects can be replicated with other diagnostic prioritization algorithms, this data set and approach could accelerate algorithmic development. Future work should focus on streamlining the note de-identification process and improving methods to create anonymous VCFs. The de-identified data set is available for sharing upon request. The study includes figures showing the comparison of average diagnostic rank for combinations of identified and de-identified data, the percentage of cases solved at a given diagnostic variant rank, and PR curves comparing CAValRi outputs at a given threshold.
Asset Subtitle
Presenting Author - Brandon Stone, MD; Co-Author - Robert J. Schuetz, BS; Co-Author - Austin A. Antoniou, PhD; Co-Author - Emma Garval, BS; Co-Author - Amad Hussain, BS; Co-Author - Bimal P. Chaudhari, MD MPH;
Meta Tag
Bioinformatics
Exome sequencing
Genetic Testing
Genome sequencing
Genomic Methodologies
NextGen Sequencing
Phenotype
Sequencing
Variant Detection
Co-Author
Robert J. Schuetz, BS
Co-Author
Austin A. Antoniou, PhD
Co-Author
Emma Garval, BS
Co-Author
Amad Hussain, BS
Co-Author
Bimal P. Chaudhari, MD MPH
Presenting Author
Brandon Stone, MD
Keywords
de-identified patient data
diagnostic prioritization algorithms
genetic diagnosis
exome sequencing
genome sequencing
variant call format (VCF)
phenotype information
CAVaLRi algorithm
diagnostic variant rank
note de-identification
© 2024 American College of Medical Genetics and Genomics. All rights reserved.
×