Help

FULL COLLECTION AND SEARCH BREAKPOINTS COMPARE AND ANALYZE JUNCTION SEARCH

FULL COLLECTION AND SEARCH

The search (see the "Full Collection & Search" page) can be performed using: ESTid ("ChimeraID"), the names of the genes participating in the chimeras ("Gene Name/ID(s)", e.g. LMNA, DDX5 or their Entrez NCBI_ID: 4000, 1655 accordingly; or "Gene UniProtKB AC", e.g. P02545, B5BUE6), a sequence identity score ("Identity", e.g. 100, 95), a tissue type ("Tissue Name", e.g. lung), or a keyword ("Keyword", e.g. RARA). The "Full Collection and Search" can be obtained using the "ChiTaRS Full Collection" option and clicking on "Search". All about 111,600 entries (2019/09/04) in the databases for Homo Sapiens, Mus musculus, Drosophila melanogaster, Rattus norvegicus, Bos taurus, Danio rerio, Saccharomyces cerevisiae and Sus scrofa are listed together.

The search results page shows all the relevant instances associated with the chimeric transcripts available, the RNA-seq data of the mapping to the chimeric junction site, the level of transcript expression and the cancer breakpoints (see "pop-ups" windows clicking on the "RNA-seq" column in the "Full Collection" table). It contains detailed information about the identifier and the link to the corresponding GenBank entry, the junction site, the gene names and the identity of the two genes incorporated into the chimera.

The expression level of each transcript is calculated by RPKM measure (Mortazavi et al, 2008) as a number of the reads mapping the junction site of the transcript divided by total number of reads per Million reads divided by a length of the junction site (138 nt for human and fruit fly, 88nt for mouse) times a thousand (for Kb).

T o p

Ranking of chimeric junction consistency

One of the key novelties of our database is the calculation and ranking of chimeric junction consistency. The ChiTaRS database contains transcripts that are chimeras of two genes, and in some cases there is evidence these two genes may participate in many chimeras. The junction consistency ranking is a measure of how many times the same junction between the same genes has been found in chimeric transcripts. Thus, if the junction site is at the same genomic location of two genes incorporated in chimeras with a difference of no more than 1,000 nucleotides (an empirical number, can be changed in the "Search" options), the junction rank is high (see “Full Collection & Search”). The junction consistency in ChiTaRS is a particularly important experimental feature that may be of interest to verify the existence of highly ranked chimeras in cells by PCR, RT-qPCR or other techniques, thereby reducing the chance of dealing with chimeras that are mere artifacts.

Graphical View of Fusion (SpliceGraphs)

A bonus feature of ChiTaRS is the visualization of chimeric transcripts, and their genomic context, including the junction site. The visualization figures were produced using the SpliceGrapher package, which was designed to predict splice graphs for a gene by combining evidence from RNA-Seq data, annotated gene models and EST alignments. To produce splice graphs for chimeras, we first used GMAP to align the ESTs to a reference genome (H. sapiens version GRCh37.62, D. melanogaster version BDGP R5/dm3 and M. musculus version NCBI37/mm9) and subsequently, SpliceGrapher was used to convert the resulting alignments into splice graphs. Finally, we used SpliceGrapher's visualization modules to integrate the ESTs and gene models into figures that illustrate chimeric splicing. Each figure shows how the ESTs align across two genes, making it possible to envisage the potential transcripts that could arise from each chimera (see “Full Collection and Search”).

T o p

RNA-seq evidence of the human, mouse and fruit fly chimeras

The ChiTaRS database provides evidence of chimeric transcripts and their mapping by the RNA-seq reads from three higher eukaryotes: human, mouse and fruit fly. The database is very robust and allows investigating the transcripts that incorporate the same orthologous genes in different organisms. An interesting example is the human chimera, ChimeraID='AW882230', and mouse chimera, ChimeraID='CF577921'. They both incorporate the PTMS gene (Parathymosin, which may mediate the immune function) and are confirmed by RNA-seq reads in the both organisms. Therefore, this RNA-seq coverage provides evidence of chimeras in different organisms that might be conserved during evolution and that involve the same orthologous genes in different eukaryotes. ChiTaRS takes the first step in this direction and one of its main future goals is related to the study of the evolution of chimeric transcripts.

Chimeras with the Hi-C points

High-throughput chromosome conformation capture (Hi-C) is a method to identify chromatin interactions across an entire genome. Hi-C experiments aim to measure the frequencies of contacts between all pairs of loci in the genome. So Hi-C quantifies interactions between all possible pairs of fragments simultaneously. The ChiTaRS database provides evidence of Hi-C points in the chimeras.

The ChiTaRS database includes Hi-C chromatin contact maps from public datasets for four organisms, namely, human, mouse, fruit fly, and yeast. We have included nearly 5,600 chimeras, spanning across 14 cell lines and tissue samples. The following Hi-C resources were used in the ChiTaRS database:

Organism GEO link Library/cell line .hic file(,s)
Human GSE63525 GM12878 [GSE63525 GM12878 dilution combined], [GM12878 combined in situ from Cell 2014]
Human GSE63525 HUVEC [GSE63525 HUVEC combined], [HUVEC 1in situ combined]
Human GSE63525 NHEK [NHEK combined], [NHEK 1in situ combined]
Human GSE63525 HMEC [HMEC combined], [HMEC 1in situ combined]
Human GSE63525 K562 [K562 combined], [K562 in situ combined]
Human GSE63525 KBM-7 [KBM7 combined], [KBM7 in situ combined]
Human GSE63525 IMR90 [IMR90 combined], [IMR90 1in situ combined]
Human GSE63525 HeLa [HeLa in situ combined]
Human GSE35156 hESC [hESC.txt.gz], [hESC replicate.txt.gz], [hESC combined.hic]
Human GSE35156 IMR90 [IMR90.txt.gz], [IMR90 replicate.txt.gz], [IMR90 combined.hic]

Mouse GSE71831 Patski [Patski], [PNAS 2016 Patski combined]
Mouse GSE96692 C57BL/6 [GSM2544836_No_Tx]
Mouse GSE96692 C57BL/6 [GSM2544837_TAC]
Mouse GSE96692 C57BL/6 [GSM2544839_Tx_Cre-plus]
Mouse GSE119171 F123 [GSE119171_JL]
Mouse GSE63525 CH12-LX [GSE63525_CH12-LX]

Fly GSE89244 Kc167 [GSM2362844_CP190_HiChIP], [CP190_HiChIP]
Fly GSE89112 Kc167 [GSE89112_Kc167combined], [GSE89112_Kc167combined]

Yeast SRP053245 SRR1791297 [SRR1791297]
Yeast SRP053245 SRR1791299 [SRR1791299]

T o p

Mass-spec peptides mapping

Sense and anti-sense strands of the same open-reading frame

Unique Gene Names

The distinct aliases used for unique gene names represent one of the main problems when dealing with different gene, protein and transcript databases, which may represent a source of duplication in the databases. In ChiTaRS, we use a specific table to map the synonymous gene names to a unique record, using the NCBI Entrez gene name as a key. We have currently performed four updates to the ChiTaRS database after manual verification of the entries and cancer breakpoints. Each update is verified automatically for the synonymous gene name so that it is unique for both genes incorporated into the chimeras. Thus, all entries currently appearing in ChiTaRS have unified gene names and as a result, searches can be performed based on gene names and synonyms (under "Full Collection & Search").

T o p

BREAKPOINTS

Verbatim Search for the Breakpoints

Cancer Types

ChiTaRS study encompassed 33 cancer types, for which data were available from at least one of the breakpoint/fusion:

Hematologic and lymphatic malignancies included:
(LAML) acute myeloid leukemia,
(DLBC) lymphoid neoplasm diffuse large B cell lymphoma,
(THYM) thymoma;
Solid tumor types included:
(OV) ovarian,
(UCEC) uterine corpus endometrial carcinoma,
(CESC) cervical squamous cell carcinoma and endocervical adenocarcinoma,
(BRCA) breast invasive carcinoma;
Urologic types included:
(BLCA) bladder urothelial carcinoma,
(PRAD) prostate adenocarcinoma,
(TGCT) testicular germ cell tumors,
(KIRC) kidney renal clear cell carcinoma,
(KICH) kidney chromophobe,
(KIRP) kidney renal papillary cell carcinoma;
Endocrine types included:
(THCA) thyroid carcinoma,
(ACC) adrenocortical carcinoma;
Core gastrointestinal types included:
(ESCA) esophageal carcinoma,
(STAD) stomach adenocarcinoma,
(COAD) colon adenocarcinoma,
(READ) rectum adenocarcinoma;
Developmental gastrointestinal types included:
(LIHC) liver hepatocellular carcinoma,
(PAAD) pancreatic adenocarcinoma,
(CHOL) cholangiocarcinoma;
Head and neck included:
(HNSC) head and neck squamous cell carcinoma;
Thoracic organ systems included:
(LUAD) lung adenocarcinoma,
(LUSC) lung squamous cell carcinoma,
(MESO) mesothelioma;
Cancers of the central nervous system included:
(GBM) glioblastoma multiforme,
(LGG) brain lower-grade glioma;
Soft tissue types included:
(SARC) sarcoma,
(UCS) uterine carcinosarcoma;
Cancers from neural-crest-derived tissues included:
(PCPG) pheochromocytoma and paraganglioma,
(SKCM) skin cutaneous melanoma (melanocytic cancers of the skin);
Eye type included:
(UVM) uveal melanoma.

For a complete list of the TCGA cancer-type abbreviations, please see here.

T o p

COMPARE AND ANALYZE

This option gets a possibility to observe conserved junction sites between two organisms of interest. Users can choose two organisms and analyze the junction sites conservations in the chimeras of these organisms using the rank and the consistency analysis.

T o p

JUNCTION SEARCH

"Junction Search" provides the option to screen through the list of RNA-seq reads found at the chimeras’ junction sites to identify putative junction sites in novel sequences provided by a user. The "DNA search" is available for all three organisms in the database, and both the transcript sequence and the GenBank accession number can be used as inputs. The search is an automatic procedure that identifies a junction site in the transcript entered by a user and that aligns the previously found "chimeric" RNA-seq reads to this junction site. This special feature of ChiTaRS allows users to identify to what extent their chimeric transcripts are similar to those for which there is RNA-seq data in the database. It is essential for scientists to be able to analyze their chimeras in the complex setting of a large high throughput dataset and with multiple sequences. In the downloads section we provide all the potential "chimeric" reads, which enables the user to search for junction coverage among other available chimeric transcripts in the different databases.

T o p

DOWNLOADS

The ChiTaRS database not only provides extended "Search" options, but it also the possibility to download all the database tables and the datasets in a very user-friendly manner. The full human, mouse and fruit fly collections include information on the two genes incorporated into the chimeras, the sequence identity and the positions of the junction sites'. In addition, the freely available RNA-seq results, all the unmapped "chimeric" RNA-seq reads, and mass-spectrometry results are downloadable for each organism and full collections of all chimera genes for the UniProtKB.

T o p