In most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used and full-length transcripts have not been observed directly. In the present study, we developed an analysis pipeline named SPLICE to analyze full- length cDNA sequences.
Genes generate various transcripts by alternative splicing, and these transcripts can have diverse functions. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used and full-length transcripts have not been observed directly. Although long-reads sequencing technologies would enable us to sequence full-length transcripts, analysis of the data is a difficult task. In the present study, we developed an analysis pipeline named SPLICE to analyze full-length cDNA sequences. Using this method, we analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC) and matched non-cancerous liver with Oxford Nanopore technology. Our analysis detected 46,663 transcripts from the protein-coding genes in the HCCs and the matched non-cancerous livers, of which 5,366 (11.5 %) were novel. Comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Importantly, 746 genes with DET were not found by the gene-level analysis. We also identified novel exons derived from transposable elements (TEs). In the analysis of transcripts from hepatitis B virus (HBV), HBx-human TE fusions were found to be overexpressed in the HCCs. Furthermore, fusion gene detection showed novel recurrent fusion events. These results suggest that long-reads sequencing technologies allow us to analyze full-length transcripts, and show the importance of splicing variants in carcinogenesis.