📋 PheKnowVec: Condition, Observation, Measurement, and Procedure Code Set Verification
Description
Verification of the mappings between phenotype source strings and source codes from OMOP common data model vocabularies
Status
In progress 🔨
Team
Code Set Generator: @Tiffany C 

Student Evaluators:

Domain Expert:
Spreadsheet
GitHub Repository

Table of Contents


Timeline

 
Code Verification Task
May 25Jun 25
24M
25T
26W
27T
28F
29S
30S
July 2019
1M
2T
3W
4T
5F
6S
7S
8M
9T
10W
11T
12F
13S
14S
15M
16T
17W
18T
19F
20S
21S
22M
23T
24W
25T
26F
27S
28S
29M
30T
31W
August 2019
1T
2F
3S
4S
5M
6T
7W
8T
9F
10S
11S
12M
13T
14W
15T
16F
17S
18S
19M
20T
21W
22T
23F
24S
25S
26M
27T
28W
29T
30F
31S
September 2019
1S
2M
3T
4W
5T
6F
7S
8S
9M
10T
11W
12T
13F
14S
15S
16M
17T
18W
19T
20F
21S
22S
23M
24T
25W
26T
27F
28S
29S
30M
October 2019
1T
2W


Project Description


Background

  • "Phenotypes are the measurable biological, behavioral and clinical markers of a condition or disease.  The process of deriving research-grade phenotypes from clinical data using computer-executable algorithms is called computational phenotyping (phenotyping for short)” (PMID: 27506131)

Computational phenotyping approaches have great potential to aid in diagnosis, prognosis, therapeutic decision-making, and identification of mechanisms or novel biomarkers. Currently, these methods have limited:
  • Generalizability because they are tailored to specific source vocabularies or hospital systems.
  • Translational relevance because they primarily rely on clinical data, which requires additional mapping to incorporate, for example, molecular or physiologic data.
  • Scalability because creating definitions is a time-consuming, iterative process requiring both domain expertise and robust external validation.

Objective: Develop a method (PheKnowVec: Phenotype Knowledge Vectors) for deriving, implementing, and validating computational phenotypes that addresses the aforementioned limitations by:
  • Mapping standardized clinical terminology concepts to linked open data.
  • Using embedding methods, which convert large complex heterogeneous data into scalable compressed vectors without semantic information loss.

Phenotypes
We will implement all phenotypes appropriate for use with pediatric and adult populations from the eMERGE network's Phenotype KnowledgeBase (n=9). Additional information on the phenotypes listed below can be found here.