NEFFy
NEFF Calculator and MSA File Converter
Loading...
Searching...
No Matches
How to Use

Table of Contents


C++ Executable File

NEFF Computation

Usage:

./neff --file=<input_file> [options]

To calculate NEFF, provide an MSA file and indicate the desired parameters. NEFF will be computed and presented as the result.

Parameters:

The code accepts the following command-line flags:

Flag Description Required Default Value Example
--file=<list of filenames> Input files (comma-separated, no spaces) containing multiple sequence alignments Yes N/A --file=my_alignment.fasta
--alphabet=<value> Alphabet of MSA
0: Protein
1: RNA
2: DNA
No 0 --alphabet=1
--check_validation=[true/false] Validate the input MSA file based on alphabet or not No false --check_validation=true
--threshold=<value> Similarity threshold for sequence weighting, must be between 0 and 1. No 0.8 --threshold=0.7
--norm=<value> Normalization option for NEFF
0: Normalize by the square root of sequence length
1: Normalize by the sequence length
2: No Normalization
No 0 --norm=2
--omit_query_gaps=[true/false] Omit gap positions of query sequence from all sequences for NEFF computation No true --omit_query_gaps=true
--is_symmetric =true/false] Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) No true --is_symmetric=false
--non_standard_option=<value> Options for handling non-standard letters of the specified alphabet
0: Treat them the same as standard letters
1: Consider them as gaps when computing similarity cutoff of sequences (only used in asymmetryc version)
2: Consider them as gaps in computing similarity cutoff and checking position of match/mismatch
No 0 --non_standard_option=1
--depth=<value> Depth of MSA to be used in NEFF computation (starting from the first sequence) No inf (consider the whole sequence) --depth=10
(if given value is greater than original depth, it considers the original depth)
--gap_cutoff=<value> Threshold for considering a position as gappy and removing that (between 0 and 1) No 1 (no gappy position) --gap_cutoff=0.7
--pos_start=<value> Start position of each sequence to be considered in NEFF (inclusive) No 1 (the first position) --pos_start=10
--pos_end=<value> Last position of each sequence to be considered in NEFF (inclusive) No inf (consider the whole sequence) --pos_end=50 (if given value is greater than the length of the MSA sequences, consider length of sequences in the MSA)
--only_weights=[true/false] Return only sequence weights, rather than the final NEFF No false --only_weights=true
--multimer_MSA=[true/false] Compute NEFF for MSA of a multimer No false --multimer_MSA=true
--stoichiom=<value> Stochiometry of the multimer when multimer_MSA=true --stoichiom=A2B1
--chain_length=<list of values> Length of the chains in a heteromer when multimer_MSA=true and multimer is a heteromer 0 --chain_length=17 45
--residue_neff=[true/false] Compute per-residue (column-wise) NEFF No false --residue_neff=true

Examples:

  • Compute Symmetric NEFF for protein MSA (No Normalization):
    ./neff --file=../MSAs/example.a2m --norm=2
    Result:

    ‍MSA sequence length: 56
    MSA depth: 5
    NEFF: 5


  • Compute NEFF for RNA MSA with Default Normalization (sqrt(L)) using 50% Gap Cutoff:
    ./neff --file=../MSAs/rna.fasta --gap_cutoff=0.5 --alphabet=1
    Result:

    ‍MSA sequence length: 28
    MSA depth: 1176
    NEFF: 5.08477


  • Compute Sequence Weights for protein MSA with Default Normalization (sqrt(L)):
    ./neff --file=../MSAs/example.a2m --only_weights=true --threshold=0.2
    Result:

    ‍MSA sequence length: 56
    MSA depth: 5
    Sequence weights:
    1 0.333333 0.25 0.333333 0.25


  • Compute per-residue (column-wise) NEFF:
    ./neff --file=../MSAs/example.a2m --check_validation=true --residue_neff=true
    Result:

    ‍MSA sequence length: 56
    MSA depth: 5
    Per-residue (column-wise) NEFF:
    0.668153 0.534522 0.668153 0.534522 0.668153 0.668153 0.668153 0.668153 0.534522 0.534522 0.534522 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.534522 0.534522 0.534522 0.668153 0.400892 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.534522 0.534522 0.534522 0.534522 0.534522 0.400892 0.534522 0.534522 0.534522 0.400892 0.133631 0.133631 0.133631 0.133631 0.133631 0.133631 0.133631 0.133631
    Median of per-residue (column-wise) NEFF: 0.668153


  • Compute Asymmetric NEFF for an Integration of MSAs with Default Normalization (sqrt(L)):
    ./neff --file=../MSAs/uniref90_hits.sto,../MSAs/bfd_uniclust_hits.a3m,../MSAs/mgnify_hits.sto --depth=2048 --is_symmetric=false
    Result:

    ‍MSA length: 338
    MSA depth: 2048
    NEFF: 106.067

The tool will integrate MSAs from the list of files in the specified order and compute the NEFF value for the integrated MSA up to a depth of 2048. If the tool reaches the specified depth before all files are integrated, it will stop processing the remaining files.
This process is akin to how AlphaFold2.3 produces the final MSA for a protein by combining three distinct MSAs.


  • Compute NEFF for MSA of a Homomer (Default Normalization):
    ./neff --file=../MSAs/homomer.aln --multimer_MSA=true --stoichiom=A2
    Result:

    ‍MSA sequence length: 102
    MSA depth: 2048
    NEff of entire MSA: 34.8453
    Neff of individual MSA: 49.2787

The tool will detect a single repetition of the individual MSA of the chain within the given MSA file and report the NEFF value for it. If the provided MSA is not in the format of a multimer MSA of a homomer (i.e., the same MSA file repeated 'n' times, where here n=2), the tool will raise an error.

  • Compute NEFF for MSA of a Heteromer (L Normalization):
    ./neff --file=../MSAs/heteromer.aln --multimer_MSA=true --stoichiom=A2B1 --chain_length=51,73 --norm=1
    Result:

    ‍MSA sequence length: 175
    MSA depth: 3072
    NEFF of entire MSA:3.2836
    NEFF of Paired MSA (depth=1024): 3.02721
    NEFF of Individual MSA for Chain A (depth=1024): 4.37843
    NEFF of Individual MSA for Chain B (depth=1024): 3.0194

The tool will identify paired MSA sequences and the sequences for each individual MSA of chains in the given MSA file, assuming the first monomer has a length of 51 residues and the second has a length of 73. It will report NEFF values for the paired MSA as well as for the MSAs corresponding to unpaired sequences. If the provided MSA is not in the format of a multimer MSA of a heteromer, the tool will raise an error.

MSA File Conversion

To convert an MSA file, specify the input file, output file, and the desired input and output formats. The tool will read the input file, perform the conversion, and write the resulting MSA to the output file in the specified format.

Usage:

./converter --in_file=<input_file> --out_file=<input_file> [options]

Parameters:

The code accepts the following command-line flags:

Flag Description Required Default Value Example
--in_file=<filename> Specifies the input MSA file to be converted.
Replace <filename> with the path and name of the input file
Yes N/A --in_file=input.fasta
--out_file=<filename> Specifies the output file where the converted MSA will be saved.
Replace <filename> with the desired path and name of the output file
Yes N/A --out_file=output.a2m
--alphabet=<value> Alphabet of MSA
0: Protein
1: RNA
2: DNA
No 0 --alphabet=1
--check_validation=[true/false] Validate the input MSA file based on alphabet or not No true --check_validation=true

Please note that the conversion is performed based on the specified input and output file extensions.

Examples:

  • Convert an A3M file to Stockholm format with RNA alphabet:
    ./converter --in_file=../MSAs/example.a2m --out_file=../MSAs/example.sto --alphabet=1
  • Convert a STO file to Clustal format without validation:
    ./converter --in_file=../MSAs/example.sto --out_file=../MSAs/example.clustal --check_validation=false
  • Convert an FASTA file to ALN format with RNA alphabet:
    ./converter --in_file=../MSAs/rna.fasta --out_file=../MSAs/rna.aln --alphabet=1



Python Library

NEFF Computation

compute_neff

Parameters:

The method accepts the following parameters:

Parameter Type Required Default Value Description
file list [string] Yes N/A Path to the input file containing the multiple sequence alignment (MSA)
only_weights bool No False Return only sequence weights, rather than the final NEFF
alphabet Alphabet (Enum) No Alphabet.Protein Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA)
check_validation bool No False Validate the input MSA file based on alphabet or not
threshold float No 0.8 Similarity threshold for sequence weighting, must be between 0 and 1
norm Normalization (Enum) No Normalization.Sqrt_Length Enum to specify normalization method for NEFF (Sqrt_Length, Length, or No_Normalization)
omit_query_gaps bool No True Omit gap positions of query sequence from all sequences for NEFF computation
is_symmetric bool No True Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric)
non_standard_option NonStandardOption (Enum) No NonStandardOption.AsStandard Enum to handle non-standard residues of the specified alphabet (AsStandard, ConsiderGapInCutoff, ConsiderGap)
depth int No inf (consider the whole sequence) Depth of MSA to be used in NEFF computation (starting from the first sequence)
gap_cutoff float No 1 (no gappy position) Threshold for considering a position as gappy and removing that (between 0 and 1)
pos_start int No 1 (the first position) Start position of each sequence to be considered in NEFF (inclusive)
pos_end int No inf (consider the whole sequence) Last position of each sequence to be considered in NEFF (inclusive)

Examples:

  • Compute Symmetric NEFF for protein MSA with No Normalization:
    import neffy
    def main():
    try:
    msa_length, msa_depth, neff = neffy.compute_neff(
    file='../MSAs/example.a2m',
    norm=neffy.Normalization.No_Normalization
    )
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print(f"NEFF: {neff}")
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 56
    MSA depth: 5
    NEFF: 5.0


  • Compute NEFF for RNA MSA with Default Normalization (sqrt(L)) using 50% Gap Cutoff:
    import neffy
    def main():
    try:
    msa_length, msa_depth, neff = neffy.compute_neff(
    file='../MSAs/rna.fasta',
    alphabet=neffy.Alphabet.RNA,
    gap_cutoff=0.5)
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print(f"NEFF: {neff}")
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 28
    MSA depth: 1176
    NEFF: 5.08477


  • Compute Sequence Weights for protein MSA with Default Normalization (sqrt(L)):
    import neffy
    def main():
    try:
    msa_length, msa_depth, weights = neffy.compute_neff(
    file='../MSAs/example.a2m',
    only_weights=True,
    threshold=0.2)
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print(f"Sequence weights: {weights}")
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 56
    MSA depth: 5
    Sequence weights: [1.0, 0.333333, 0.25, 0.333333, 0.25]


  • Compute Asymmetric NEFF for an Integration of MSAs with Default Normalization (sqrt(L)):
    import neffy
    def main():
    try:
    msa_length, msa_depth, neff = neffy.compute_neff(
    file=['../MSAs/uniref90_hits.sto', '../MSAs/bfd_uniclust_hits.a3m', '../MSAs/mgnify_hits.sto'],
    is_symmetric = False,
    depth=2048)
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print(f"NEFF: {neff}")
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 338
    MSA depth: 2048
    NEFF: 106.067

The tool will integrate MSAs from the list of files in the specified order and compute the NEFF value for the integrated MSA up to the gievn depth. If the tool reaches the specified depth before all files are integrated, it will stop processing the remaining files.
This process is akin to how AlphaFold2.3 produces the final MSA for a protein by combining three distinct MSAs.


compute_multimer_neff

Parameters:

The method accepts the following parameters:

Parameter Type Required Default Value Description
file string Yes N/A Path to the input file containing the multiple sequence alignment (MSA)
stoichiom string Yes - Stochiometry of the multimer
chain_length list [int] when multimer is a heteromer 0 Length of the chains in a heteromer
alphabet Alphabet (Enum) No Alphabet.Protein Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA)
check_validation bool No False Validate the input MSA file based on alphabet or not
threshold float No 0.8 Similarity threshold for sequence weighting, must be between 0 and 1
norm Normalization (Enum) No Normalization.Sqrt_Length Enum to specify normalization method for NEFF (Sqrt_Length, Length, or No_Normalization)
omit_query_gaps bool No True Omit gap positions of query sequence from all sequences for NEFF computation
is_symmetric bool No True Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric)
non_standard_option NonStandardOption (Enum) No NonStandardOption.AsStandard Enum to handle non-standard residues of the specified alphabet (AsStandard, ConsiderGapInCutoff, ConsiderGap)
depth int No inf (consider the whole sequence) Depth of MSA to be used in NEFF computation (starting from the first sequence)
gap_cutoff float No 1 (no gappy position) Threshold for considering a position as gappy and removing that (between 0 and 1)
pos_start int No 1 (the first position) Start position of each sequence to be considered in NEFF (inclusive)
pos_end int No inf (consider the whole sequence) Last position of each sequence to be considered in NEFF (inclusive)

Examples:

  • Compute NEFF for MSA of a Homomer (Default Normalization):
    import neffy
    def main():
    try:
    msa_length, msa_depth, entire_neff, neff_values = neffy.compute_multimer_neff(
    file='../MSAs/homomer.aln',
    stoichiom="A2"
    )
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print(f"NEff of entire MSA: {entire_neff}")
    print(f"neff of individual MSA: {neff_values}")
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 102
    MSA depth: 2048
    NEFF of entire MSA: 34.8453
    NEFF of individual MSA: 49.2787

The tool will detect a single repetition of the individual MSA of the chain within the given MSA file and report the NEFF value for it. If the provided MSA is not in the format of a multimer MSA of a homomer (i.e., the same MSA file repeated 'n' times given in 'stoichiom'), the tool will raise an error.

  • Compute NEFF for MSA of a Heteromer (L Normalization):
    import neffy
    def main():
    try:
    msa_length, msa_depth, entire_neff, neff_values = neffy.compute_multimer_neff(
    file='../MSAs/heteromer.aln',
    stoichiom="A2B1",
    chain_length=[51, 73],
    norm=neffy.Normalization.Length)
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print(f"NEff of entire MSA: {entire_neff}")
    print(f"NEFF values: {neff_values}")
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 175
    MSA depth: 3072
    NEff of entire MSA: 3.2836
    NEFF values: {'paired': {'neff': 3.02721, 'depth': 1024}, 'A': {'neff': 4.37843, 'depth': 1024}, 'B': {'neff': 3.0194, 'depth': 1024}}

The tool will identify paired MSA sequences and the sequences for each individual MSA of chains in the given MSA file, based on the given lengths in 'chain_length'. It will report NEFF values for the paired MSA as well as for the MSAs corresponding to unpaired sequences. If the provided MSA is not in the format of a multimer MSA of a heteromer, the tool will raise an error.

compute_residue_neff

Parameters:

The method accepts the following parameters:

Parameter Type Required Default Value Description
file list [string] Yes N/A Path to the input file containing the multiple sequence alignment (MSA)
alphabet Alphabet (Enum) No Alphabet.Protein Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA)
check_validation bool No False Validate the input MSA file based on alphabet or not
threshold float No 0.8 Similarity threshold for sequence weighting, must be between 0 and 1
norm Normalization (Enum) No Normalization.Sqrt_Length Enum to specify normalization method for NEFF (Sqrt_Length, Length, or No_Normalization)
omit_query_gaps bool No True Omit gap positions of query sequence from all sequences for NEFF computation
is_symmetric bool No True Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric)
non_standard_option NonStandardOption (Enum) No NonStandardOption.AsStandard Enum to handle non-standard residues of the specified alphabet (AsStandard, ConsiderGapInCutoff, ConsiderGap)
depth int No inf (consider the whole sequence) Depth of MSA to be used in NEFF computation (starting from the first sequence)
gap_cutoff float No 1 (no gappy position) Threshold for considering a position as gappy and removing that (between 0 and 1)
pos_start int No 1 (the first position) Start position of each sequence to be considered in NEFF (inclusive)
pos_end int No inf (consider the whole sequence) Last position of each sequence to be considered in NEFF (inclusive)

Examples:

  • Compute per-residue (column-wise) NEFF:
    import neffy
    def main():
    try:
    msa_length, msa_depth, residue_neffs, median = neffy.compute_residue_neff(
    file='../MSAs/example.a2m',
    check_validation=True)
    print(f"MSA length: {msa_length}")
    print(f"MSA depth: {msa_depth}")
    print("Per-residue (column-wise) NEFF:\n", residue_neffs)
    print("Median of per-residue (column-wise) NEFF:", median)
    except RuntimeError as e:
    print(e)
    if __name__ == "__main__":
    main()
    Result:

    ‍MSA length: 56
    MSA depth: 5
    Per-residue (column-wise) NEFF:
    [0.668153, 0.534522, 0.668153, 0.534522, 0.668153, 0.668153, 0.668153, 0.668153, 0.534522, 0.534522, 0.534522, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.534522, 0.534522, 0.534522, 0.668153, 0.400892, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.534522, 0.534522, 0.534522, 0.534522, 0.534522, 0.400892, 0.534522, 0.534522, 0.534522, 0.400892, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631]
    Median of per-residue (column-wise) NEFF: 0.668153


MSA File Conversion

convert_msa

Parameters:

The method accepts the following parameters:

Parameter Type Required Default Value Description
in_file string Yes N/A Path to the input MSA file that needs to be converted
out_file string Yes N/A Path where the converted MSA file will be saved
alphabet Alphabet (Enum) No Alphabet.Protein Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA)
check_validation bool No True Whether to check the validation of the MSA file before conversion

Example:

import sys
import neffy
def main():
try:
output = neffy.convert_msa(
in_file='../MSAs/example.a2m',
out_file='../MSAs/example.sto')
print(output)
except RuntimeError as e:
print(e)
if __name__ == "__main__":
main()



**** All example MSAs used here can be found in the 'MSA' folder of Github repository.

**** All Python scripts used in the examples can be found in the 'example' folder of Github repository.



For further assistance or inquiries, please contact the developer or create an issue in the GitHub repository.

Footer