Motif search
Gene Resources
What is Capris?
Download
Contact
Reference

Cancer Gene Promoter

Related Motif Search

 

What is CAPRIS?
Cancer gene promoters
Motif extraction and store
CAPRIS Database search
Validation of SOM
People in CAPRIS

What is CAPRIS?

The cancer gene promoter related motif search (CAPRIS) database contains cancer gene classes that are grouped based on their 600bp promoter region (-500 to +100) and motifs extracted from these groups.

The CAPRIS database was constructed as a result of the analysis of 2036 cancer related gene promoter sequences using machine learning techniques. These sequences were labeled based on their relation to 23 different cancer types. Then these sequences were clustered on the SOM. Finally, a total of 168 clusters from neighboring SOM nodes were formed for 23 cancer types. Then we analyzed these cancer gene promoter groups with the MEME motif extraction tool in order to obtain common sequence motifs.

The extracted 168 clusters and their motifs were stored in a searchable and downloadable relational database. The database can be queried based on gene name, ID, cancer type and nucleotide sequence fragment with an internal BLAST. Users can perform NCBI-BLAST analysis against extracted motifs in addition to 2036 cancer related promoter gene motifs in the selected 600bp region.

  Back to Top


Cancer gene promotors

     Cancer types were categorized according to the Cancer by Body Location/System of National Cancer Institute (U.S. National Institutes of Health www.cancer.gov). On this basis, we composed 23 different classes of cancer types (Table 2.1). Then, cancer specific genes were extracted from NCBI (http://www.ncbi.nlm.nih.gov/). The Gene Bibliograhy Database in NCBI was searched to determine whether a gene has a relation with any of the types of cancer mentioned below. All gene references were scanned using regular expressions. A total of 5474 cancer gene relations were taken out from this database. Since different cancer types share the same gene(s), a total of 2080 genes are extracted.

Table 2.1 18 Cancer Types

1. ALL

2.AML

3. Bone

4. Brain

5. Breast

6. CLL

7. CML

8. Digestive

9. Endocrine

10. Eye

11. Genitourinary

12. Germcell

13. Gynecologic

14. Head and Neck

15. Hodgkins Lymphoma

16.Leukemia

17. Lung

18. Lymphoma

19. Musculoskeletal

20.Neurologic

21. Non-Hodgkins Lymphoma

22. Respiratory

23. Skin

 

Table 2.2 Subcancer Types

1. ALL

13. Gynecologic

Acute Lymphoblastic Leukemia
  Cervical Cancer Endometrial Cancer
  Gestational Trophoblastic Tumor Ovarian Epithelial Cancer

2.AML

Ovarian Germ Cell Tumor Uterine Sarcoma
Acute Myeloid Leukemia   Vaginal Cancer Vulvar Cancer
   

3. Bone

14. Head and Neck

Ewing's Family of Tumors Osteosarcoma Hypopharyngeal Cancer Laryngeal Cancer
  Lip&Oral Cavity Cancer Neck Cancer

4. Brain

Nasopharyngeal Cancer Oropharyngeal Cancer
Brain Tumor Brain Stem Glioma Parathyroid Cancer Salivary Gland Cancer
Cerebral Astrocytoma Malignant Glioma Paranasal Sinus&Nasal Cavity Cancer
Ependymoma Medulloblastoma  
Pineoblastoma Hypothalamic Glioma

15. Hodgkins Lymphoma

 

Hodgkin's Lymphoma

5. Breast

 

Breast Cancer  

16. Leukemia

  Acute Lymphoblastic Leukemia Acute Myeloid Leukemia

6. CLL

Chronic Lymphocytic Leukemia Hairy Cell Leukemia
Chronic Lymphocytic Leukemia   Chronic Myelogenous Leukemia  
 

17.Lung

7. CML

Lung Cancer  
Chronic Myelogenous Leukemia    
 

18. Lymphoma

8. Digestive/Gastrointestinal

T-Cell Lymphoma Hodgkin's Lymphoma
Anal Cancer Bile Duct Cancer Mycosis Fungoides Non-Hodgkin's Lymphoma
Carcinoid Tumor Colon Cancer Sezary Syndrome Nervous System Lymphoma
Esophageal Cancer Gallbladder Cancer Waldenstram's Macroglobulinemia
Liver Cancer Pancreatic Cancer  
Rectal Cancer Stomach Cancer

19. Musculoskeletal

  Ewing's Family of Tumors Osteosarcoma

9. Endocrine

Rhabdomyosarcoma Soft Tissue Sarcoma
Adrenocortical Carcinoma Carcinoid Tumor Uterine Sarcoma
Islet Cell Carcinoma Parathyroid Cancer  
Pheochromocytoma Pituitary Tumor

20. Neurologic

Thyroid Cancer   Brain Tumor Brain Stem Glioma
  Cerebellar Astrocytoma  Ependymoma

10. Eye

Medulloblastoma Pineoblastoma
Melanoma Retinoblastoma Neuroblastoma Pituitary Tumor
  Visual Pathway&Hypothalamic Glioma

11. Genitourinary

 
Bladder Cancer Kidney Cancer

21. Non-Hodgkins Lymphoma

Penile Cancer Prostate Cancer Non-Hodgkin's Lymphoma  
Renal Pelvis & Ureter Cancer Testicular Cancer  
Urethral Cancer  

22. Respiratory/Thoracic

     Lung Cancer Malignant Mesothelioma

12. Germcell

Thymoma&Thymic Carcinoma
Germ Cell Tumor Testicular Cancer  
 

23. Skin

  Cutaneous T-Cell Lymphoma Kaposi's Sarcoma
  Melanoma Merkel Cell Carcinoma
  Skin Cancer  

     From NCBI database, only the cancer specific gene id (number) and symbol names could be extracted. Therefore, the next step of the extraction process involved finding the RefSeq identification numbers of the genes from these symbol names. This was done by using the HUGO Gene Symbol List table. Then the extracted RefSeq Id's were fed into Promoser, a large-scale mammalian promoter and transcription start side identification service, to obtain the sequences for the promoter regions of these cancer genes. The length of the extracted promotor sequences was set to 600 (500 downstream and 100 upstream).

     Table 2.3 shows the number of extracted sequences among the genes in our database. All the genes that were extracted during the process described above can be accessed through their gene ID, name, cancer type(s) and promoter region on the Gene_Resources( http://www.i-cancer.org/gensor.htm ) page.

 

Table 2.3 Extracted Sequence numbers

Cancer Type # of cancer genes from NCBI # of Promoter sequences extracted
ALL
127
121
AML
191
187
Bone
76
76
Brain
263
262
Breast
579
571
CLL
113
107
CML
41
41
Digestive
634
619
Endocrine
77
77
Eye
334
332
Genitourinary
475
467
Germcell
25
25
Gynecologic
131
130
Head and Neck
38
38
Hodgkins Lymphoma
39
39
Leukemia
590
576
Lung
289
287
Lymphoma
270
265
Musculoskeletal
98
98
Neurologic
389
387
Non-Hodgkins Lymphoma
41
41
Respiratory
318
314
Skin
336
334
total 5474 5394
total genes in CAPRIS 2080 2036

 

Back to Top


Motifs extraction and store

    Similar promoter region sequences from the same cancer types were clustered by using Self Organizing Map (SOM) (Figure 3.1). Each SOM node represents a group of genes sharing common promoter sequences which are specific to that particular cancer type. Finally, a total of 168 clusters from neighboring SOM nodes were constructed for 23 cancer types.

Figure 3.1

 

 

     Then we analyzed these cancer gene promoter groups with the MEME motif extraction tool in order to obtain common sequence motifs (Figure 3.2). The cancer gene groups clustered according to their selected promoter region, and their extracted motifs were stored in a searchable and downloadable MySQL relational database.

Figure 3.2 MEME result of a cancer gene promoter groups

 

Back to Top


CAPRIS Database search

     There are two different databases in CAPRIS webpage. Motif search contains the information for cancer gene groups which were clustered according to the gene's selected promoter region, and their extracted motifs. Gene Resources contains all the final extracted genes from CancerGene Database with its gene ID, name, cancer type(s) and promoter region.

For more detailed explanation about searching in CAPRIS click here

Figure 4.1 Use Case Diagram of CAPRIS

 

Back to Top

 


Validation of SOM

     All clusters of each cancer types were integrated again. These genes were clustered by using KMEANS in MATLAB. Two different clustering of same genes were obtained.Lastly these two clustering types were compared according to Rand Index. (Table 5.1)

P and Q are two different clustering types.

N 11 the number of point pairs that are in the same cluster under both P and Q
N 00 number of point pairs in different clusters under P and Q
N 10 number of point pairs in the same cluster under P but not under Q
N 01 number of point pairs in the same cluster under Q but not under P

Rand Index:

Table 5.1 Rand Index of Cancer Types

Cancer Type

Rand Index

Cancer Type

Rand Index
ALL

0.96

Gynecologic

0.97

AML 0.95 Head and Neck

1

Bone

0.78

Hodgkins Lymphoma 1
Brain

0.93

Leukemia

0.95

Breast

0.93

Lung

0.92

CLL

0.96

Lymphoma

0.91

CML

1

Musculoskeletal

0.98

Digestive

0.87

Neurologic

0.94

Endocrine

1

Non-Hodgkins Lymphoma 0.74
Eye

0.94

Respiratory

0.90

Genitourinary

0.90

Skin

0.92

Germcell
1
   

 

Back to Top

 


People in CAPRIS

Email Name Institution
  Rengul Cetin Atalay Bilkent University
  Volkan Atalay Middle East Technical University
  Allan Dickerman Virginia Bioinformatics Institute
  Steve Akman Wake Forest University School of Medicine
  M.Erkut Erdem Middle East Technical University
  I.Aykut Erdem Middle East Technical University
  Murat Iskar Bilkent University

Back to Top