Data Processing :
Illumina Bcl2FastQ software used for basecalling. Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence. The remaining reads were filtered against the rRNA database to remove possible ribosomal RNA contamination, and then mapped to the hg19 whole genome using Tophat2 v2.0.13 with parameters --read_mismatches 2 --read_gap_length 2 --read_edit_dist 2. Reads Per Kilobase of exon per Megabase of library size (RPKM) were calculated using a protocol from Chepelev et al., Nucleic Acids Research, 2009. In short, exons from all isoforms of a gene were merged to create one meta-transcript. The number of reads falling in the exons of this meta-transcript were counted and normalized by the size of the meta-transcript and by the size of the library.