Novoalign version 3 is a major release of the most accurate NGS read aligner. Improvements have been made for longer reads and more accurate indel detection. Go to http://www.novocraft.com and download it now to get started.
A new review shows Novoalign being the aligner of choice for more accurate variant detection. Get the full article at http://www.edgebio.com/exome-data-analysis-pipeline.
Release Novoalign V2.08.02 & NovoalignCS V1.02.02, Novosort (V1.0)
Please download from http://www.novocraft.com. Below are the new release details
# Fix: GATK complains if the CIGAR field contains a multibase substitution encoded as adjacent insert & delete operations. Multi base substitutions are now encoded in the CIGAR as M’s. This does not change the dynamic programming algorithm and scoring. Internally a long substitution may be scored as adjacent inserts and deletes but then shown in the CIGAR as mismatches plus an indel to make up any length difference.
#Removed input file buffering when reading from pipes. This allows Novoalign to be used as a service.
#Fix: If a paired read sequence contains many invalid IUB NA codes then Novoalign performance is reduced. Rather than rejecting read sequences with invalid base codes Novoalign treats them as mismatches to all bases. This fix will QC any read with more than 8 invalid base codes.
#Fix: if option –hdrhd off was used on NovoalignMPI the reads were processed as single end rather than paired end.
#Fix: When using option –rOQ to report pre-calibrated colour qualities the qualities on -ve strand alignments were reversed relative to the CS & CQ tags. The fix reports CS, CQ & OQ in same direction as the original read.
#Fix: Novoutil is completing with odd exit status values. Exit status is now zero unless there was an error.
#Fix: If quality attribute was ‘.’ then SNPs were rejected as low quality. We now accept these SNPs
#Fix: If genotype couldn’t be determined from GT, AC or AC1 attribute SNPs were assumed to be homozygous for the alt allele. We now assume heterozygous for reference & alt allele. Note. This only has an affect when -g option is used.
#Fix: SNPs would not match chromosome names if the case was different. We now do a case insensitive compare of names.
#Fix: In some situations the sort was not “stable” and could change the order of two alignments with the same alignment location. A stable sort is one where if two records have the same sort key their relative positions are not changed. This was likely to happen if the input BAM file size was between 1 & 2 times the amount of RAM allocated for the sort.
#Fix: The -a option was not being recognised.
#Fix: If an @RG record was given as an option and the input BAM file already had an @RG record then the replacement RG tag was only substituted on alignments that had an RG:Z: tag. Alignments without RG tags stayed that way.
#Added option for name sort
#Added option to create a BAM index for the final sorted bam file
We have released new versions of Novoalign and NovoalignCS. Please visit our downloads page to get these latest offerings.
- Fixed problem where if read was hard clipped and soft clipped the soft clipping was reported first. Hard clipping should always be first.
- Sensitivity/Specificity neutral performance improvements in paired end mode.
- Change in paired end processing that will report additional pairs where one end has a low alignment score and the other end a very high alignment score and that were previously reported as an alignment to low scoring read and a “No Match” to the high scoring read
- NovoalignCS – When calling Nucleotide sequence for a colour space read if a base has quality <=3 we now call the base as an N rather than as the reference base.
- NovoalignMPI* – Added option to disable memory mapping of the index file. This can improve performance on servers with NUMA memory architecture.
- NovoalignMPI* – Force processor affinity if the number of threads is the same as the number of cores on the server for a slave process.
- Novobarcode – Several changes including adapter trimming and support for QSEQ read files.
- Changes in Bi-Seq? mode to better support calling methylation status including a new tag to indicate the CT/GA strand of alignment and a new program novomethyl that calls methylation status from a samtools mpileup file. Notes on using novomethyl can be found in our wiki at Novocraft-BiSeq.
- Added an option to report 3′ alignment location.
In more detail:
- For Bi-Seq? alignments in SAM report format we add tag ZB:Z:mode where mode is either CT or GA and indicates which mode/strand the read was aligned in. Reads aligned in CT mode are usually from the 5′-3′ strand of the chromosome. A simple methylation analysis pipeline could be constructed by splitting alignments into two SAM files using the ZB tag and then running samtools pileup on each file. Non-methylated cytosines should show up as ‘T’ on the CT alignments and as A on the GA alignments. More details on our wiki at http://tinyurl.com/Novocraft-BiSeq
- Change to automatic fastq format detection to allow -F STDFQ even if quality range looks like Illumina fastq format. We recommend always using the -F option to ensure correct interpretation of quality values.
- For Bi-Seq? alignments if a read aligns in the same location and direction and with same score in both GA & CT modes (typically there are no unmethylated cytosines) then we choose randomly whether to report as CT or GA aligned. Earlier versions would have biased reporting to the CT mode. Note. This has no effect if your protocol has preserved strand and you are using the -b2 option which should be the case if you are using the Illumina protocol.
- In SAM report format when reporting multiple alignments per read and one read of a pair is unaligned then the mate location is now shown as the primary alignment location.
- Support for read files compressed with bzip2. i.e. *.bz2 files.
- Change to alignment seeding to allow drop back to a single seed location if other seed locations have too many low quality bases. More mismatches are allowed in the single seed so that sensitivity is not affected.
- Add option –3Prime that enables reporting of 3′ locus of alignments. This will appear in SAM files as tag Z3:i:9999
- When running multi-threaded with exactly one thread per CPU core (default) we now set processor affinity for each thread to force specific CPU per thread. This overcomes problems with Linux CFS scheduler where one or two cores may be idle while running a single multi-threaded job.
- Beta release of methylation status caller. Please refer to our wiki at http://tinyurl.com/Novocraft-BiSeq
We are pleased to announce a new release of Novoalign and NovoalignCS. Please visit our Downloads section at http://www.novocraft.com to get these new releases.
In this release we have addressed some Picard validation problems to do with mate alignment locations and insert size.
One remaining Picard validation issue is that alignments can be hard clipped by -H or -a option and also soft clipped as a result of Smith-Waterman alignment. In Picard V1.35 and earlier this was flagged as an error. Corrections have been made to Picard and should be in V1.36 when released.
In command line processing we now validate that readgroup record includes ID: tag. Applies to -o SAM “@RG\tID:…” option.
Fix for Picard validation failure when -Q option was used. Alignments were being reported as mate was mapped but then it’s mate was reported as unmapped if its alignment quality was below the -Q reporting limit. The -Q option is really redundant for use with SAM format and we may remove it in a future release.
Correct problem where insert size was +1bp for non proper pairs that aligned on the same chromosome.
Fix Picard validation problem where mate alignment location (MRNM) may differ by 1bp from actual mate alignment.
In miRNA mode and SAM report format (option -m -oSAM) add custom tags ZH:i: & ZL:i: for hairpin score and alignment location.
Corrections to CIGAR & MD fields for case where alignments are soft-clipped.
Novoalign used in Science Journal Publication: High-Resolution Analysis of Parent-of-Origin Allelic Expression in the Mouse Brain
Gregg and Coworkers used Novoalign to map massively-parallel sequencing reads to the mouse genome. Results revealed interesting insights into the parental bias of gene expression in brain development.
Get Novoalign from http://www.novocraft.com
For the full publication see:
Gregg C, Zhang J, Weissbourd B, Luo S, Schroth GP, Haig D, Dulac C. High-resolution analysis of parent-of-origin allelic expression in the mouse brain.Science. 2010 Aug 6;329(5992):643-8. Epub 2010 Jul 8.
- Full support for Illumina Mate pairs including circularisation junction inside a read.
- The default SAM report format is now to Softclip alignments back to best local alignment.
- Support for 454 Paired end reads.
We are happy to announce NovoalignCS V1.01 for use with the Life Technologies SOLiD™ platform. The aligner is a culmination of hard work by our development team to produce an accurate tool for working with colorspace reads.
This version sees some performance improvements and bug fixes. A message passing interface (MPI) version is also available for doing alignments on multiple compute nodes.
Novocraft Technologies are proud to announce the beta release of NovoalignCS. NovoalignCS is based on our popular Novoalign software for short read alignment.
NovoalignCS makes effective use of color space double encoding technology used by the ABI SOLiD system ™.
- Supports single end and mate pair libraries
- Discovery of color errors in short reads
- Base quality calibration.
- Support for gzipped input files.
- Multi-threaded for extra performance.
- Full index assisted Needleman-Wunsch alignment with affine gap penalties, colour transitions, and quality based scoring.
- Phred style scoring and alignment qualities
- Iterative alignment process.
- Indexed genome loaded into shared memory for reuse in multiprocessing
- Can use csfasta (with qual file) & csfastq input formats.
- For reads with multiple good alignments can report: none; one by random selection; or all alignments.
- Also an option to report all alignments up to some maximum phred score P(R|Ai)
- Posterior alignment probability (Quality Score) and filter.
- Quality filter for reads can remove low quality and polyclonal reads.
- Supports lower case masking of genome.
- Supports varying read lengths
- Iterative read trimming until read aligns
- SAM and native output formats.
- Structural Variation penalty.
Please vist www.novocraft.com to download the software and find out more information.