README.md 1.97 KB
Newer Older
Lucile Broseus's avatar
Lucile Broseus committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
**TALC: Transcriptome-Aware Long Read Correction**
___________________________________________________

**Requirements:**

* Jellyfish2   

Currently, TALC makes use of k-mer counts table as dumped by the Jellyfish2 k-mer counter.  

Jellyfish2 can be dowload from: https://github.com/zippav/Jellyfish-2  

Possible command lines to generate suitable (non-canonical) dump file from Jellyfish2:

*For paired-end short read data:*  

```
jellyfish count --mer $kmerSize -s 100M -o $out.jf -t $nthreads $SRfq1 $SRfq2  
jellyfish dump -c $out.jf > $out.dump
```

*For single-end short read data:*  

```
jellyfish count --mer $kmerSize -s 100M -o $out.jf -t $nthreads $SRfq  
jellyfish dump -c $out.jf > $out.dump
```

______________________________________________

**Running TALC**

```
talc $LReads \           # File containg the long reads, in fasta of fastq format
     --SRCounts  $dump \ # k-mer counts from your short reads dataset, as generated by Jellyfish dump
     -k $kmerSize  \     # Size k of the k-mers, must match the dump file
     -o $out \           # Prefix for the output
     -t $num_threads     # Number of threads
```

Important:  
in TALC, short and long read sequences must be in the same direction (the weighted de Bruijn graph is directional).
If your long reads are reverse complement of your short reads, please add the option: 
> --reverse 

*Using known splice junctions*

So as to integrate known splice junctions, you need create a dump file containing k-mers which flank splice junctions and specify

```
talc $LReads \           # File containg the long reads, in fasta of fastq format
     --SRCounts  $dump \ # k-mer counts from your short reads dataset, as generated by Jellyfish dump
     --junctions $junc \ # k-mer counts of a subset of k-mers flanking known splice junctions, as generated by Jellyfish dump
     -k $kmerSize  \     # Size k of the k-mers, must match the dump file
     -o $out \           # Prefix for the output
     -t $num_threads     # Number of threads
```