A pharma research company in Europe had genotyping and sequencing data around a disease from a cohort study. The primary aim of the longitudinal cohort study was to see the effect of genetic variants in a particular population. In addition to this, the company also did a follow up study to measure the gene expression data in the same set of cohorts.
The challenges were: how to make sense out of this copious amount of data? What insights do the data convey? What knowledge can be gained from this data that can be applied in drug discovery research?
The main aim of the project as given to Causality Biomodels was to identify the functional consequences of the genetic variants and to see the molecular mechanisms that are perturbed by the single nucleotide polymorphisms (SNPs).
Initially, at CBM, we linked all significant SNPs with its genetic context, i.e, the genes associated with the SNP. Then, to narrow down the number of SNPs and genes, we selected only the genes that were significantly differently regulated. Following this, we systematically collected the literature about these genes in the context of this disease. The articles that were selected were the ones having the gene and disease terms co-occurring in the abstract.
The selected pool of scientific articles was further curated to select those which mentioned the mechanism of genes. Specific keyword combinations were formulated to retrieve full text articles which mentioned gene-protein/protien–protein interactions or which talked about causative mechanisms in which the gene is an upstream node or downstream node.
In case where any relevant article linking the gene and the disease could not be found, the search strategy employed terms such as genes, ‘comorbid disease’, or other diseases, which comes under the same category of the disease. In addition, articles which described the association to the SNPs and disease were also collected.
After finalizing the literature corpus, the causal mechanisms of all the genes was extracted into a computable format. Along with each cause and effect mechanism, we also added the contextual information such as the experimental set up, assay conducted, animal models (strain) used etc. These causal chains were used to build the knowledge assembly.
Based on the enrichment with the causal mechanism, we identified the associated pathways, that these SNPs were perturbing. In this way, we added functional context around each statistically significant SNPs and genes.
What did the customer gain?
Traditionally, the pharma research company was using conventional gene set enrichment analysis tools - GSEA, DAVID etc.- to identify canonical pathways associated with the data. However, with our Causality knowledge models, the company could now understand the meaning of their data in a disease-specific context, rather than just a prediction of signaling pathways from a functional enrichment analysis tool. The knowledge model we provided helped them to identify relationships/links which were specific to the disease that were missing/hidden from the traditional data analysis.