Datasets and Annotation
All publicly reported chimeric RNA (7829 transcripts) were considered for the analysis.
The chimeric ESTs for
and Sus scrofa were considered
(380 transcripts for Homo sapiens;
200 transcripts for every organism: Mus musculus and Drosophila melanogaster;
and 5 transcripts for: Rattus norvegicus, Bos taurus, Danio rerio, Saccharomyces cerevisiae and Sus scrofa)
by Li et. al. 
together with all chimeric ESTs (6178 sequences) and mRNAs (1046 sequences)
from ChimerDB  were used.
All chimeric RNAs have well-defined junction sites (at least six nucleotides on each side of the junction).
However, only few chimeric sequences have canonical splice-junction sites .
For the dataset of Li et. al. , UCSC BLAT search was used to find a sequence similarity
between the chimeric RNA transcripts and human genomic regions in order to annotate genes participating
in the chimera organization.
This was followed by identifying all aligned exons, introns or untranslated regions in the chimeras using known
transcripts information from ENSEMBL.
The BLAST method was applied to recognize the corresponding protein domains for every exon of chimeric mRNAs.
Finally, WU BLAST was employed when short or "strange" genomic regions were found in order to find their identity
in more precise way, because WU BLAST was shown to be most efficient when the transcript composition is unknown.
Mapping the Chimeric Transcripts by the RNA-Seq Paired Reads
The RNA-seq reads to the human genome and annotated exon junctions were mapped.
The reads which had not mapped at the previous stages were selected and mapped to the chimeric transcripts.
Only the reads which mapped precisely on the junction of the chimera, with a minimum of 6 nucleotides (nt)
or 5 nt for the short paired-end reads (50nt) mapping on each side of the junction were selected.
This protocol is stringent, as it ensures that if a read maps both to a known transcript and to a chimeric transcript,
it will be assigned to the known transcript.
All the mappings were performed using GEM  allowing for a maximum of 3 mismatches.
The same procedure was applied for chimeric transcripts from mouse and fruit fly .
Visualization of Chimeras by SpliceGraphs
ChiTaRS-3.1 also provides visualization of chimeric transcripts, and their genomic context,
including the junction site.
These figures were generated using the SpliceGrapher package,
which was designed for analysis and visualization of RNA-Seq data .
These figures highlight the genes on either side of a chimeric junction, making it possible to visualize
the potential transcripts that could arise from each chimera.
Identification Chimeric Proteins by the Mass-Spectrometry Experiments
To discover chimeras at the protein level, the peptide mass spectra from human proteomics experiments were used
from the two publicly available proteomics databases.
The GPM set consisted of 5,809 mzXML format spectra files and the PeptideAtlas set was 52,019 mzXML
format spectra files.
The unique peptides were identified by searching against the GENCODE annotation of the human genome .
Since, the GENCODE annotation is still not complete for all human genes, it was only possible to distinguish peptides
that map to the GENCODE annotations.
In order to statistically evaluate found peptide the overall rate of the False Discovery Rate (FDR) was studied.
A decoy database was produced for the 62,943 unique transcripts from the 22,027 unique genes of GENCODE.
The target/decoy strategy has been designed to accomplish this task by means of a random synthetic protein database
(a decoy database) that preserves the general composition of the target database but does not overlap with it.
The matching peptides from the decoy database were used to estimate the FDR, since they do not correspond to
factual peptides. The threshold sensitivity (the fraction of true positive identifications together with E-value)
was used to estimate the significance of found unique peptides. Finally, chimeric transcripts having the junction site
confirmed by one or more peptides with the combined E-value less than 10-4 were considered as true-positive.
Li, X. et. al. (2009)
Short homologous sequences are strongly associated
with the generation of chimeric RNAs in eukaryotes.
J Mol Evol. 68(1):56-65.
Kim, P. et. al. (2010)
ChimerDB 2.0-a knowledgebase for fusion genes updated.
Nucleic Acids Res. 38(Database issue): D81-5.
GEM Tool and Library.
Rogers, M.F. et. al. (2012)
SpliceGrapher: detecting patterns of alternative splicing
from RNA-Seq data in the context of gene models and EST data.
Genome Biol. 13(1): R4.
Tress, M.L. et. al. (2008)
Proteomics studies confirm the presence of alternative protein
isoforms on a large scale
Genome Biol. 9(11): R162.