Every base in the protein coding sequence was substituted by the other three bases to find positions of single base substitutions that would lead to a premature stop codon. Coding regions were obtained based on GENCODE19 gene annotation model obtained from ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz Protein_coding transcripts that had an annotated START and STOP were used. The score file gives ALoFT scores for all base substitutions that lead to a premature stop codon. The score is calculated for the longest coding transcript (based on length of protein). In cases where several transcripts of a gene have the same length, the longest trancript was chosen randomly. Fraction_transcripts_affected = No. of transcripts affected by the SNP/Total no. of protein_coding transcripts for the gene Affected_transcripts indicate if 'all' transcripts of a gene are affected by a premature stop-causing SNP or only 'some' of the transcripts are affected In cases where ALoFT returns a similar probability of classification between classes, there is uncertainty in the predicted class. By calculating the standard deviation of class probabilities across our 40 trained random forest models, we obtain a 95% confidence interval for ALoFT predictions. If the confidence interval of the predicted class probability overlaps with the confidence interval of either of the two less likely classifications (single-sided test), we attach the label 'Low Confidence' (p > 0.05) to the prediction. Otherwise the prediction is labeled 'High Confidence' (p < 0.05).