|FULL COLLECTION AND SEARCH||BREAKPOINTS||COMPARE AND ANALYZE||JUNCTION SEARCH|
The search (see the "Full Collection & Search" page) can be performed using: ESTid ("ChimeraID"), the names of the genes participating in the chimeras ("Gene Name/ID(s)", e.g. LMNA, DDX5 or their Entrez NCBI_ID: 4000, 1655 accordingly; or "Gene UniProtKB AC", e.g. P02545, B5BUE6), a sequence identity score ("Identity", e.g. 100, 95), a tissue type ("Tissue Name", e.g. lung), or a keyword ("Keyword", e.g. RARA). The "Full Collection and Search" can be obtained using the "ChiTaRS Full Collection" option and clicking on "Search". All about 111,600 entries (2019/09/04) in the databases for Homo Sapiens, Mus musculus, Drosophila melanogaster, Rattus norvegicus, Bos taurus, Danio rerio, Saccharomyces cerevisiae and Sus scrofa are listed together.
The search results page shows all the relevant instances associated with the chimeric transcripts available, the RNA-seq data of the mapping to the chimeric junction site, the level of transcript expression and the cancer breakpoints (see "pop-ups" windows clicking on the "RNA-seq" column in the "Full Collection" table). It contains detailed information about the identifier and the link to the corresponding GenBank entry, the junction site, the gene names and the identity of the two genes incorporated into the chimera.
The expression level of each transcript is calculated by RPKM measure (Mortazavi et al, 2008) as a number of the reads mapping the junction site of the transcript divided by total number of reads per Million reads divided by a length of the junction site (138 nt for human and fruit fly, 88nt for mouse) times a thousand (for Kb).
One of the key novelties of our database is the calculation and ranking of chimeric junction consistency. The ChiTaRS database contains transcripts that are chimeras of two genes, and in some cases there is evidence these two genes may participate in many chimeras. The junction consistency ranking is a measure of how many times the same junction between the same genes has been found in chimeric transcripts. Thus, if the junction site is at the same genomic location of two genes incorporated in chimeras with a difference of no more than 1,000 nucleotides (an empirical number, can be changed in the "Search" options), the junction rank is high (see “Full Collection & Search”). The junction consistency in ChiTaRS is a particularly important experimental feature that may be of interest to verify the existence of highly ranked chimeras in cells by PCR, RT-qPCR or other techniques, thereby reducing the chance of dealing with chimeras that are mere artifacts.
A bonus feature of ChiTaRS is the visualization of chimeric transcripts, and their genomic context, including the junction site. The visualization figures were produced using the SpliceGrapher package, which was designed to predict splice graphs for a gene by combining evidence from RNA-Seq data, annotated gene models and EST alignments. To produce splice graphs for chimeras, we first used GMAP to align the ESTs to a reference genome (H. sapiens version GRCh37.62, D. melanogaster version BDGP R5/dm3 and M. musculus version NCBI37/mm9) and subsequently, SpliceGrapher was used to convert the resulting alignments into splice graphs. Finally, we used SpliceGrapher's visualization modules to integrate the ESTs and gene models into figures that illustrate chimeric splicing. Each figure shows how the ESTs align across two genes, making it possible to envisage the potential transcripts that could arise from each chimera (see “Full Collection and Search”).
The ChiTaRS database provides evidence of chimeric transcripts and their mapping by the RNA-seq reads from three higher eukaryotes: human, mouse and fruit fly. The database is very robust and allows investigating the transcripts that incorporate the same orthologous genes in different organisms. An interesting example is the human chimera, ChimeraID='AW882230', and mouse chimera, ChimeraID='CF577921'. They both incorporate the PTMS gene (Parathymosin, which may mediate the immune function) and are confirmed by RNA-seq reads in the both organisms. Therefore, this RNA-seq coverage provides evidence of chimeras in different organisms that might be conserved during evolution and that involve the same orthologous genes in different eukaryotes. ChiTaRS takes the first step in this direction and one of its main future goals is related to the study of the evolution of chimeric transcripts.
High-throughput chromosome conformation capture (Hi-C) is a method to identify chromatin interactions across an entire genome. Hi-C experiments aim to measure the frequencies of contacts between all pairs of loci in the genome. So Hi-C quantifies interactions between all possible pairs of fragments simultaneously. The ChiTaRS database provides evidence of Hi-C points in the chimeras.
The ChiTaRS database includes Hi-C chromatin contact maps from public datasets for four organisms, namely, human, mouse, fruit fly, and yeast. We have included nearly 5,600 chimeras, spanning across 14 cell lines and tissue samples. The following Hi-C resources were used in the ChiTaRS database:
|Organism||GEO link||Library/cell line||.hic file(,s)|
|Human||GSE63525||GM12878||[GSE63525 GM12878 dilution combined], [GM12878 combined in situ from Cell 2014]|
|Human||GSE63525||HUVEC||[GSE63525 HUVEC combined], [HUVEC 1in situ combined]|
|Human||GSE63525||NHEK||[NHEK combined], [NHEK 1in situ combined]|
|Human||GSE63525||HMEC||[HMEC combined], [HMEC 1in situ combined]|
|Human||GSE63525||K562||[K562 combined], [K562 in situ combined]|
|Human||GSE63525||KBM-7||[KBM7 combined], [KBM7 in situ combined]|
|Human||GSE63525||IMR90||[IMR90 combined], [IMR90 1in situ combined]|
|Human||GSE63525||HeLa||[HeLa in situ combined]|
|Human||GSE35156||hESC||[hESC.txt.gz], [hESC replicate.txt.gz], [hESC combined.hic]|
|Human||GSE35156||IMR90||[IMR90.txt.gz], [IMR90 replicate.txt.gz], [IMR90 combined.hic]|
|Mouse||GSE71831||Patski||[Patski], [PNAS 2016 Patski combined]|
The distinct aliases used for unique gene names represent one of the main problems when dealing with different gene, protein and transcript databases, which may represent a source of duplication in the databases. In ChiTaRS, we use a specific table to map the synonymous gene names to a unique record, using the NCBI Entrez gene name as a key. We have currently performed four updates to the ChiTaRS database after manual verification of the entries and cancer breakpoints. Each update is verified automatically for the synonymous gene name so that it is unique for both genes incorporated into the chimeras. Thus, all entries currently appearing in ChiTaRS have unified gene names and as a result, searches can be performed based on gene names and synonyms (under "Full Collection & Search").
ChiTaRS study encompassed 33 cancer types, for which data were available from at least one of the breakpoint/fusion:
For a complete list of the TCGA cancer-type abbreviations, please see here.
This option gets a possibility to observe conserved junction sites between two organisms of interest. Users can choose two organisms and analyze the junction sites conservations in the chimeras of these organisms using the rank and the consistency analysis.
"Junction Search" provides the option to screen through the list of RNA-seq reads found at the chimeras’ junction sites to identify putative junction sites in novel sequences provided by a user. The "DNA search" is available for all three organisms in the database, and both the transcript sequence and the GenBank accession number can be used as inputs. The search is an automatic procedure that identifies a junction site in the transcript entered by a user and that aligns the previously found "chimeric" RNA-seq reads to this junction site. This special feature of ChiTaRS allows users to identify to what extent their chimeric transcripts are similar to those for which there is RNA-seq data in the database. It is essential for scientists to be able to analyze their chimeras in the complex setting of a large high throughput dataset and with multiple sequences. In the downloads section we provide all the potential "chimeric" reads, which enables the user to search for junction coverage among other available chimeric transcripts in the different databases.
The ChiTaRS database not only provides extended "Search" options, but it also the possibility to download all the database tables and the datasets in a very user-friendly manner. The full human, mouse and fruit fly collections include information on the two genes incorporated into the chimeras, the sequence identity and the positions of the junction sites'. In addition, the freely available RNA-seq results, all the unmapped "chimeric" RNA-seq reads, and mass-spectrometry results are downloadable for each organism and full collections of all chimera genes for the UniProtKB.