br wide range of correlations both in the positive and
wide range of correlations, both in the positive and negative ranges (P-values up to ~ 10 22, Fig. 3B).
3.2.2. The top correlated genes are associated with tumorigenesis We examined in detail the strongest correlations between
expression and mutation by selecting the top 10 positive correla-tions. For SKCM all positive correlations originated from genes expressed at low levels; indeed, all genes with P-values <1 10 5
(105 total) were weakly expressed (mean ± SD 0.15 ± 0.19 log2(-normalized rsem þ 1)). After excluding the genes with an average SP600125 level <1.0, the top 10 best positive correlations were found in LUAD (Fig. 3C). These included 5 genes associated with the Gene Ontology term “Cell cycle” (BORA, CENPA, KIF2C, MCM10 and TPX2), EME1 involved in the Fanconi anemia pathway, HLTF asso-ciated with the BRCA1 pathway and KIFC1, MYBL2 and SPAG5, belonging to the cytoskeletal, EGF/EGFR and PI3K/Akt cell signaling pathways, respectively. Notably, none of these 10 genes was listed as a cancer gene in the COSMIC cancer gene census.
To assess if expression of these 10 genes correlates with clinical outcome in LUAD, we applied the Kaplan-Meier (KM) estimator to patients with high (above average) and low (at or below average) gene expression levels; for 8 genes, patients with high expression performed worse that those with low expression (Fig. 3C). Also, expression levels for all 10 genes were higher in the tumors than in the matched control samples in LUAD (Fig. 3D), with all P-values exceeding <2 10 16. We extended the tumor/normal comparison of these 10 genes to all other tumor types for which at least 10 normal samples were available (15 including LUAD). In all cases TPX2, CENPA, KIFC1, MCM10 and MYBL2 were overexpressed in the tumor, EME1 and KIF2C were overexpressed in 14/15 tumors, BORA in 13/15, SPAG5 in 11/15 and HLTF in 9/15. Thus, the top 10 positively correlated genes are highly expressed in tumors.
Of the 10 top genes most negatively correlated with mutation loads, the strongest association was found for MLH1 in ESCA (Fig. 3C). Mutations in MLH1 or its low expression are known for their role in tumorigenesis (Cancer Genome Atlas, 2012), however none of the other 9 genes were listed in the COSMIC cancer gene census. Of note, low expression of PIGR and SPATA2, which are associated with “Innate immunity” and the “DNA damage response”, respectively, was found to be associated with poor prognosis in LUAD (Fig. 3C) and survival probability was further decreased for patients with combined low expression of both PIGR and SPATA18 (Fig. 3F). In summary, the top genes most strongly correlated with mutations loads reveal strong associations between deregulation of gene expression and poor survival in cancer.
3.2.3. Expression of genes in key pathways is altered across several tumor types
To further explore the involvement of the top genes correlated with mutations in tumorigenesis, we focused on two genes: MYBL2 for the positive correlations and SDHAF3 for the negative correla-tions. MYBL2 (MYB proto-oncogene like 2) codes for Myb-related protein 2, it is related to the MYB family of transcription factors and plays both a positive and a negative regulatory role in cell cycle progression. Among the genes regulated by MYBL2 are CENPA, KIF2C and KIFC1 (Fig. 3C), which have a role in centromere and microtubule-motor function during mitotic chromosome segrega-tion and bipolar spindle formation. We addressed whether CENPA, KIF2C and KIFC1 were coexpressed with MYBL2, and found that the three genes were coexpressed with MYBL2 in all tumor types, with an average regression coefficient R of ~0.8 (Fig. 4A) and occasional Rs exceeding 0.9 (Fig. 4B). As expected, P-values for the correlations were also highly significant, approaching zero in BRCA, with over
Fig. 3. Gene expression profiles correlate with cancer somatic mutations. Panel A, Plot of regression coefficients (R, x axis) vs. P-values (y axis) for correlations between gene expression (all genes) and somatic mutations for patients with CHOL (black) or BRCA (red). Panel B, S-plots of P-values for the correlations between gene expression (all genes) and somatic mutations for 32 TCGA datasets. Panel C, list of top 10 genes whose mRNA levels (expression) were most strongly correlated with somatic mutations. GO term, selected gene ontology term from the human gene database GeneCards (https://www.genecards.org); Corr, P-value, P-values from panel B; KM P-value, P-value for the Kaplan-Meier plot; COSMIC CGC, whether or not the gene is listed as cancer-promoting gene in the COSMIC cancer gene census. Panel D, Kaplan-Meier survival curves for LUAD patients with low (red) gene expression for PIGR and SPATA18 relative to LUAD patients (blue) with the other 3 combinations (high PIGR high SPATA18; low PIGR high SPATA18; high PIGR low SPATA18). Panel E, box plot of mRNA levels (y axis) in the tumor (blue) and normal (red) samples for the 10 genes with the strongest positive correlations between gene expression and somatic mutations for patients with LUAD. ***, P-value <2 10 16. Panel F, number of tumor types (y axis) in which gene expression for a given gene (x axis) was higher in the tumor than in matched normal control tissues (green, 15 total tumor types tested) and the number of instances (orange) in which the P-value for the difference was <2 10 16.