NEFFy
NEFF Calculator and MSA File Converter
Loading...
Searching...
No Matches
Number of EFFective sequences (NEFF)

NEFF measures sequence diversity within MSAs, which are crucial for extracting correlated mutation information and essential for biological investigations like contact map and structure prediction. NEFF has shown a strong correlation with prediction accuracy in models such as AlphaFold.

For an MSA, NEFF can be formulated as:

\[ \left( \frac{1}{\sqrt{L}} \right) \sum_{n=1}^{N} \frac{1}{1 + \sum_{m=1, m \neq n}^{N} I[S_{m,n} \geq thr]} \]


where \(L\) is the number of residues in the sequence, \(N\) is the number of sequences in an MSA, \(S_{m,n}\) is the sequence identity between \(m\)-th and \(n\)-th sequences, \(thr\) is the threshold cutoff to determine whether two sequences are similar or not, and \(I\) is the inversion bracket, meaning that \(I[S_{m,n} \geq thr]\) equals 1 if \(S_{m,n} \geq thr\) and 0 otherwise.

Note that \(\frac{1}{\sqrt{L}}\) is used as a normalization factor here.

Generally, one can see NEFF simply as a normalized summation of sequence weights for all sequences in an MSA. If the number of sequences (including itself) similar to sequence \(i\) is \(n_i\), then its sequence weight is \(\frac{1}{n_i}\). This approach for calculating NEFF has been widely used in various contact and structure prediction tools, as demonstrated in references [1-6].






References

  1. Morcos, F., et al. "Direct-coupling analysis of residue coevolution captures native contacts across many protein families." Proceedings of the National Academy of Sciences 108.49 (2011): E1293-E1301.
  2. Simkovic, F., et al. "ConKit: a Python interface to contact predictions." Bioinformatics 33.14 (2017): 2209-2211.
  3. Wu, Q., et al. "Analysis of several key factors affecting DCA-based contact prediction in metagenome coevolution." Bioinformatics 35.14 (2019): 2497-2503.
  4. Zhang, J., et al. "DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins." Bioinformatics 36.5 (2020).
  5. Liu, Y., et al. "Protein contact prediction using metagenome sequence data improves fold recognition." Bioinformatics 37.12 (2021): 1770-1776.
  6. Li, Y., et al. "TripletRes: fragment-free protein structure prediction using triplet transformers." Bioinformatics 37.22-23 (2021): 4101-4107.



For further assistance or inquiries, please contact the developer or create an issue in the GitHub repository.

Footer