Gene Expression Microarray Data Meta-Analysis Identifies
Candidate Genes and Molecular Mechanism Associated
with Clear Cell Renal Cell Carcinoma
The first two authors equally contributed to this work.
Wang Y, Wei H, Song L, Xu L, Bao J, Liu J. Gene expression microarray data meta-analysis identifies candidate genes and molecular mechanism associated with clear cell renal cell carcinoma. Cell J. 2020; 22(3): 386-393. doi: 10.22074/cellj.2020.6561.
We aimed to explore potential molecular mechanisms of clear cell renal cell carcinoma (ccRCC) and provide candidate target genes for ccRCC gene therapy.
Materials and Methods
This is a bioinformatics-based study. Microarray datasets of GSE6344, GSE781 and GSE53000 were downloaded from Gene Expression Omnibus database. Using meta-analysis, differentially expressed genes (DEGs) were identified between ccRCC and normal samples, followed by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) function analyses. Then, protein-protein interaction (PPI) networks and modules were investigated. Furthermore, miRNAs-target gene regulatory network was constructed.
Total of 511 up-regulated and 444 down-regulated DEGs were determined in the present gene expression
microarray data meta-analysis. These DEGs were enriched in functions like immune system process and pathways like
Toll-like receptor signaling pathway. PPI network and eight modules were further constructed. A total of 10 outstanding
DEGs including TYRO protein tyrosine kinase binding protein (
TYROBP, IRF7 and PPARGC1A might play important roles in ccRCC via taking part in the immune system process.
Clear cell renal cell carcinoma (ccRCC) is a type of RCC developed in adults (1). It has been reported that ccRCC is the most aggressive subtype of RCC (2). Surgery by radical or partial nephrectomy is the main choice for treatment of ccRCC. Although chemotherapy and immunotherapy could be applicable in patients with metastatic ccRCC, the outcome is unsatisfactory (3). Even some of ccRCC are along with the worst prognosis among the common epithelial tumors of the kidney (4). Therefore, exploring molecular biomarkers serving as diagnostic and therapeutic targets, when used alone or in combination with other clinical parameters, are urgently required for better clinical management.
Accumulating evidences suggest that certain
differentially expressed genes (DEGs) are closely related
to disease progression. Duns et al. (5) showed that
histone methyltransferase gene SET domain containing
In this study, meta-analysis was used to detect potential DEGs between ccRCC and normal samples based on three microarray datasets. Moreover, functional and pathway enrichment analyses were carried out for these DEGs. Then, protein-protein interaction (PPI) network was investigated. Furthermore, miRNA-target gene interaction network was constructed. We hoped to explore the underlying molecular mechanisms of ccRCC and provide candidate target genes for ccRCC gene therapy.
Materials and Methods
Microarray data and preprocessing
This is a bioinformatics-based study. Microarray
datasets were downloaded from Gene Expression
Omnibus database based on the data quantity, sample
grouping, microarray platform (Affymetrix) and number
of citation. Finally, three datasets were selected: GSE6344,
GSE781 and GSE53000. The reasons for selection are as
follows: i. Large data quantities, ii. Clear grouping of the
experiment (tumor vs. normal), iii. Common microarray
platforms (Affymetrix), iv. Consistent sample types
(tissue samples). In details, 10 ccRCC and 10 normal
tissue samples sequencing on the platform of GPL96 [HGU133A] Affymetrix Human Genome U133A Array were
selected for analysis in microarray dataset GSE6344 (10).
For GSE781 (11), 12 ccRCC and 5 normal tissue samples
sequenced on the platform of GPL96 [HG-U133A]
Affymetrix Human Genome U133A Array were selected.
In addition, all samples in GSE53000 (12) (56 ccRCC
and 6 normal tissue samples) sequencing on the platform
of [HuGene-1_0-st] Affymetrix Human Gene 1.0 ST
Array [transcript (gene) version] were used for analysis.
Specially, GSE53000 included two samples of lymph
node metastasis and one sample of venous thrombus
metastasis. Thus, principal component analysis (PCA) was
carried out for the 56 ccRCC tissue samples and 6 normal
tissue samples. As shown in Figure S1, (Supplementary
Online Information at
Differentially expressed genes identification
DEGs of ccRCC and normal samples were separately screened based on multiple experimental datasets using MetaDE package in R software (14). The heterogeneity test was performed according to the expression values of each gene under different experimental platforms with the statistical parameters of tau2, Qvalue and Qpval. The tau2=0 [estimated amount of (residual) heterogeneity] and Qpval>0.05 (P values for the test of heterogeneity) represented significant homogeneity. Finally, BenjaminiHochber adjusted P value (fdr) <0.05, tau2=0 and Qpval >0.05 were considered as the cut-off criteria for DEGs selection. Furthermore, the log2 FC of ccRCC vs. normal >0 represented up-regulated DEGs, while log2 FC of ccRCC vs. normal <0 represented that DEGs were downregulated.
Functional annotation and pathway enrichment analysis of differentially expressed genes
The clusterProfiler is an online tool applied for enrichment analysis (15). Gene Ontology (GO) functional annotation was used to analyze functions assembled with the up- and down-regulated genes by clusterProfiler. GO functions include molecular function (MF), biological process (BP) and cellular component (CC). To better understand pathways of the involved DEGs, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed using clusterProfiler. A P value (the significance threshold of the hypergeometric test) of <0.05 and count (the number of enriched genes) of >2 were used as the cut-off criteria for this analysis.
Constructing protein-protein interaction network and modules analyses
PPI plays a key role in the completion of cellular functions, while they are usually correlated to each other in the form of a PPI network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) is a biological database of predicted and known PPIs. According to this database, the PPI network of DEG-encoding proteins in each group was constructed with the criterion of combined score (medium confidence) >0.4, and it was then visualized by the Cytoscape (version 3.2.0) software (National Institute of General Medical Sciences, USA). Score of nodes in the current network was analyzed using degree centrality, a topology property index. Higher node score presented more important node in the network, suggesting that is more likely the hub node in this network. Furthermore, the MCODE tool (National Institute of General Medical Sciences, USA) (16) in Cytoscape was used to screen the modules from the network.
miRNA-target gene regulatory network construction
The potential ccRCC related miRNAs were explored based on Enrichr database. The miRNA-target gene regulatory network was constructed with the miRNA associating with up- and down-regulated gene based on Cytoscape software (National Institute of General Medical Sciences, USA).
Differentially expressed genes investigation between clear cell renal cell carcinoma and control groups
With fdr <0.05, tau2=0 and Qpval >0.05, 955 DEGs were identified in ccRCC group compared to that of the normal controls, including 511 up-regulated and 444 down-regulated genes.
Gene ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses
Using clusterProfiler, GO functional enrichment analysis was performed and the results showed that the upregulated DEGs were significantly enriched in functions like immune system process (GO_BP, P=5.02E-18), intracellular (GO_CC, P=5.86 E-06) and protein binding (GO_MF, P=1.09E-18). Meanwhile, the down-regulated genes were mainly enriched in functions like small molecule metabolic process (GO_BP, P=1.39E-39), cytoplasm (GO_CC, P=1.01E-06) and binding (GO_MF, P=3.81E-16) (Fig .1,).
Pathway enrichment analysis showed that the upregulated genes were enriched in pathways like Toll-like receptor signaling pathway (P=1.94E-05), while the downregulated genes were enriched in pathways like metabolic pathways (P=1.27E-28). The top five enriched pathways are listed in Table 1.
|Category||Pathway ID||Pathway||Count||P value|
|Up||hsa04620||Toll-like receptor signaling pathway||13||1.94E-05|
|hsa05150||Staphylococcus aureus infection||8||3.18E-04|
|hsa04666||Fc gamma R-mediated phagocytosis||10||5.19E-04|
|hsa00280||Valine, leucine and isoleucine degradation||15||1.04E-11|
P<0.05 was considered to be significantly different. KEGG; Kyoto encyclopedia of genes and genomes.
Protein-protein interaction network and modules investigation
To dig out more effective information, regarding the DEGs mentioned above, PPI network was constructed on the basis of interaction relationship among the proteins. With score=0.4, a total of 2483 PPI pairs and 643 DEG-encoded proteins were identified. According to the score of degree centrality, TYRO protein tyrosine kinase binding protein (TYROBP, degree=68, up-regulation), cathepsin S (CTSS, degree=53, upregulation), colony stimulating factor 1 receptor (CSF1R, degree=52, up-regulation), Fc fragment of IgE receptor Ig (FCER1G, degree=43, up-regulation), protein tyrosine phosphatase, receptor type C (PTPRC, degree=43, up-regulation), mitogen-activated protein kinase 1 (MAPK1, degree=43, up-regulation), CD53 molecule (CD53, degree=42, up-regulation), Rasrelated C3 botulinum toxin substrate 2 (RAC2, degree=42, up-regulation), cluster of differentiation 14 (CD14, degree=41, up-regulation), cytochrome C, and somatic (CYCS, degree=40, down-regulation) were the top 10 proteins encoded by DEGs.
The MOCDE software showed eight modules from PPI network. The detail information was showed in Figure 2. To further investigate crucial pathways involved in the process of ccRCC, the KEGG pathway analysis was performed on DEGs in these modules. Results showed that chemokine signaling pathway (P=4.88E-05), Parkinson’s disease (P=3.17E-24), protein digestion and absorption (P=3.31E-08) and ribosome (P=5.00E-08) were the most significant pathways enriched by DEG-encoded proteins in the respectively modules a, b, c and d. Meanwhile, NOTCH signaling pathway (P=3.99E-02), oxidative phosphorylation (P=7.89E-10), oxidative phosphorylation (P=5.74E-08) and adipocytokine signaling pathway (P=2.18E-02) were the most significant pathways enriched respectively by modules e, f, g and h. The top two KEGG pathways in each module are listed in Table 2.
miRNAs-target gene regulatory network analyses
Here, miRNAs targeted up- and down-regulated
genes were investigated based on Enrichr software.
Then, the regulatory network was constructed using
Cytoscape software. Results showed that there were
four miRNAs (including miR-145, miR-199B, miR-
199A and miR-412), 94 up-regulated genes [such as
interferon regulatory factor 7 (
|Module ID||Pathway ID||Pathway name||Count||P value||Genes|
|a||hsa04062||Chemokine signaling pathway||6||4.88E-05||CXCL9, RAC2, CXCL16, HCK, GNG2...|
|hsa04060||Cytokine-cytokine receptor interaction||6||3.17E-04||CXCL9, IL10RA, CXCL16, TNFSF13B, CSF1R...|
|b||hsa05012||Parkinson’s disease||14||3.17E-24||NDUFB2, NDUFA3, UQCRFS1, NDUFA9, NDUFB8...|
|hsa00190||Oxidative phosphorylation||14||3.97E-24||NDUFB2, NDUFA3, UQCRFS1, NDUFA9, NDUFB8...|
|c||hsa04974||Protein digestion and absorption||4||3.31E-08||COL4A2, OL1A1, COL4A1, COL15A1|
|hsa04512||ECM-receptor interaction||3||1.15E-05||COL4A2, COL1A1, COL4A1|
|d||hsa03010||Ribosome||7||5.00E-08||RPL35A, RPS24, RPS15A, RPS16, RPL30...|
|hsa05150||Staphylococcus aureus infection||4||6.74E-05||ITGAM, C1QA, C3AR1, C1QC|
|e||hsa04330||Notch signaling pathway||1||3.99E-02||DTX3L|
|hsa04623||Cytosolic DNA-sensing pathway||1||3.99E-02||IRF7|
|f||hsa00190||Oxidative phosphorylation||6||7.89E-10||SDHB, COX5A, COX6A1, SDHC, ATP5G3...|
|hsa05012||Parkinson’s disease||5||9.97E-08||DHB, COX5A, COX6A1, SDHC, ATP5G3|
|g||hsa00190||Oxidative phosphorylation||5||5.74E-08||ATP6V0D1, ATP6V0C, PPA2, ATP5C1, ATP6V1H...|
|hsa05110||Vibrio cholerae infection||3||3.94E-05||ATP6V0D1, ATP6V0C, ATP6V1H|
|h||hsa04920||Adipocytokine signaling pathway||3||2.18E-02||MTOR, PPARGC1A, PPARA|
KEGG; Kyoto Encyclopedia of Genes and Genomes.
ccRCC is one of the most common RCC identified in
adults presenting the worst prognosis among the common
epithelial tumors of kidney. In this study, total 955 DEGs
were identified in ccRCC from normal samples. GO
analysis showed that these DEGs, including up-regulated
Immune system protects host body against disease.
Aberration of immune system could lead to inflammatory
diseases, autoimmune diseases and cancer (17).
Importantly, ccRCC has shown extensive responses to
immune checkpoint blockade therapies (18). Based on
an in-depth immune profiling study, Chevrier et al. (19)
Abnormal immune response might be an important
mechanism of ccRCC.
There is no financial support and conflict of interest in this study.
Y.W., H.W.; Contributed to the design of research, acquisition of data, analysis and interpretation of data, statistical analysis and drafting the manuscript. L.S., L.X., J.B.; Contributed to the analysis and interpretation of data and statistical analysis. J.L.; Contributed to the design of research. All authors read and approved the final manuscript.