To calculate NEFF, provide an MSA file and indicate the desired parameters. NEFF will be computed and presented as the result.
The code accepts the following command-line flags:
Flag | Description | Required | Default Value | Example |
---|---|---|---|---|
--file=<list of filenames> | Input files (comma-separated, no spaces) containing multiple sequence alignments | Yes | N/A | --file=my_alignment.fasta |
--alphabet=<value> | Alphabet of MSA 0: Protein 1: RNA 2: DNA | No | 0 | --alphabet=1 |
--check_validation=[true/false] | Validate the input MSA file based on alphabet or not | No | false | --check_validation=true |
--threshold=<value> | Similarity threshold for sequence weighting, must be between 0 and 1. | No | 0.8 | --threshold=0.7 |
--norm=<value> | Normalization option for NEFF 0: Normalize by the square root of sequence length 1: Normalize by the sequence length 2: No Normalization | No | 0 | --norm=2 |
--omit_query_gaps=[true/false] | Omit gap positions of query sequence from all sequences for NEFF computation | No | true | --omit_query_gaps=true |
--is_symmetric =true/false] | Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) | No | true | --is_symmetric=false |
--non_standard_option=<value> | Options for handling non-standard letters of the specified alphabet 0: Treat them the same as standard letters 1: Consider them as gaps when computing similarity cutoff of sequences (only used in asymmetryc version) 2: Consider them as gaps in computing similarity cutoff and checking position of match/mismatch | No | 0 | --non_standard_option=1 |
--depth=<value> | Depth of MSA to be used in NEFF computation (starting from the first sequence) | No | inf (consider the whole sequence) | --depth=10 (if given value is greater than original depth, it considers the original depth) |
--gap_cutoff=<value> | Threshold for considering a position as gappy and removing that (between 0 and 1) | No | 1 (no gappy position) | --gap_cutoff=0.7 |
--pos_start=<value> | Start position of each sequence to be considered in NEFF (inclusive) | No | 1 (the first position) | --pos_start=10 |
--pos_end=<value> | Last position of each sequence to be considered in NEFF (inclusive) | No | inf (consider the whole sequence) | --pos_end=50 (if given value is greater than the length of the MSA sequences, consider length of sequences in the MSA) |
--only_weights=[true/false] | Return only sequence weights, rather than the final NEFF | No | false | --only_weights=true |
--multimer_MSA=[true/false] | Compute NEFF for MSA of a multimer | No | false | --multimer_MSA=true |
--stoichiom=<value> | Stochiometry of the multimer | when multimer_MSA=true | --stoichiom=A2B1 | |
--chain_length=<list of values> | Length of the chains in a heteromer | when multimer_MSA=true and multimer is a heteromer | 0 | --chain_length=17 45 |
--residue_neff=[true/false] | Compute per-residue (column-wise) NEFF | No | false | --residue_neff=true |
MSA sequence length: 56
MSA depth: 5
NEFF: 5
MSA sequence length: 28
MSA depth: 1176
NEFF: 5.08477
MSA sequence length: 56
MSA depth: 5
Sequence weights:
1 0.333333 0.25 0.333333 0.25
MSA sequence length: 56
MSA depth: 5
Per-residue (column-wise) NEFF:
0.668153 0.534522 0.668153 0.534522 0.668153 0.668153 0.668153 0.668153 0.534522 0.534522 0.534522 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.534522 0.534522 0.534522 0.668153 0.400892 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.668153 0.534522 0.534522 0.534522 0.534522 0.534522 0.400892 0.534522 0.534522 0.534522 0.400892 0.133631 0.133631 0.133631 0.133631 0.133631 0.133631 0.133631 0.133631
Median of per-residue (column-wise) NEFF: 0.668153
MSA length: 338
MSA depth: 2048
NEFF: 106.067
The tool will integrate MSAs from the list of files in the specified order and compute the NEFF value for the integrated MSA up to a depth of 2048. If the tool reaches the specified depth before all files are integrated, it will stop processing the remaining files.
This process is akin to how AlphaFold2.3 produces the final MSA for a protein by combining three distinct MSAs.
MSA sequence length: 102
MSA depth: 2048
NEff of entire MSA: 34.8453
Neff of individual MSA: 49.2787
The tool will detect a single repetition of the individual MSA of the chain within the given MSA file and report the NEFF value for it. If the provided MSA is not in the format of a multimer MSA of a homomer (i.e., the same MSA file repeated 'n' times, where here n=2), the tool will raise an error.
MSA sequence length: 175
MSA depth: 3072
NEFF of entire MSA:3.2836
NEFF of Paired MSA (depth=1024): 3.02721
NEFF of Individual MSA for Chain A (depth=1024): 4.37843
NEFF of Individual MSA for Chain B (depth=1024): 3.0194
The tool will identify paired MSA sequences and the sequences for each individual MSA of chains in the given MSA file, assuming the first monomer has a length of 51 residues and the second has a length of 73. It will report NEFF values for the paired MSA as well as for the MSAs corresponding to unpaired sequences. If the provided MSA is not in the format of a multimer MSA of a heteromer, the tool will raise an error.
To convert an MSA file, specify the input file, output file, and the desired input and output formats. The tool will read the input file, perform the conversion, and write the resulting MSA to the output file in the specified format.
The code accepts the following command-line flags:
Flag | Description | Required | Default Value | Example |
---|---|---|---|---|
--in_file=<filename> | Specifies the input MSA file to be converted. Replace <filename> with the path and name of the input file | Yes | N/A | --in_file=input.fasta |
--out_file=<filename> | Specifies the output file where the converted MSA will be saved. Replace <filename> with the desired path and name of the output file | Yes | N/A | --out_file=output.a2m |
--alphabet=<value> | Alphabet of MSA 0: Protein 1: RNA 2: DNA | No | 0 | --alphabet=1 |
--check_validation=[true/false] | Validate the input MSA file based on alphabet or not | No | true | --check_validation=true |
Please note that the conversion is performed based on the specified input and output file extensions.
compute_neff
The method accepts the following parameters:
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
file | list [string] | Yes | N/A | Path to the input file containing the multiple sequence alignment (MSA) |
only_weights | bool | No | False | Return only sequence weights, rather than the final NEFF |
alphabet | Alphabet (Enum) | No | Alphabet.Protein | Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA) |
check_validation | bool | No | False | Validate the input MSA file based on alphabet or not |
threshold | float | No | 0.8 | Similarity threshold for sequence weighting, must be between 0 and 1 |
norm | Normalization (Enum) | No | Normalization.Sqrt_Length | Enum to specify normalization method for NEFF (Sqrt_Length, Length, or No_Normalization) |
omit_query_gaps | bool | No | True | Omit gap positions of query sequence from all sequences for NEFF computation |
is_symmetric | bool | No | True | Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) |
non_standard_option | NonStandardOption (Enum) | No | NonStandardOption.AsStandard | Enum to handle non-standard residues of the specified alphabet (AsStandard, ConsiderGapInCutoff, ConsiderGap) |
depth | int | No | inf (consider the whole sequence) | Depth of MSA to be used in NEFF computation (starting from the first sequence) |
gap_cutoff | float | No | 1 (no gappy position) | Threshold for considering a position as gappy and removing that (between 0 and 1) |
pos_start | int | No | 1 (the first position) | Start position of each sequence to be considered in NEFF (inclusive) |
pos_end | int | No | inf (consider the whole sequence) | Last position of each sequence to be considered in NEFF (inclusive) |
MSA length: 56
MSA depth: 5
NEFF: 5.0
MSA length: 28
MSA depth: 1176
NEFF: 5.08477
MSA length: 56
MSA depth: 5
Sequence weights: [1.0, 0.333333, 0.25, 0.333333, 0.25]
MSA length: 338
MSA depth: 2048
NEFF: 106.067
The tool will integrate MSAs from the list of files in the specified order and compute the NEFF value for the integrated MSA up to the gievn depth. If the tool reaches the specified depth before all files are integrated, it will stop processing the remaining files.
This process is akin to how AlphaFold2.3 produces the final MSA for a protein by combining three distinct MSAs.
compute_multimer_neff
The method accepts the following parameters:
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
file | string | Yes | N/A | Path to the input file containing the multiple sequence alignment (MSA) |
stoichiom | string | Yes | - | Stochiometry of the multimer |
chain_length | list [int] | when multimer is a heteromer | 0 | Length of the chains in a heteromer |
alphabet | Alphabet (Enum) | No | Alphabet.Protein | Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA) |
check_validation | bool | No | False | Validate the input MSA file based on alphabet or not |
threshold | float | No | 0.8 | Similarity threshold for sequence weighting, must be between 0 and 1 |
norm | Normalization (Enum) | No | Normalization.Sqrt_Length | Enum to specify normalization method for NEFF (Sqrt_Length, Length, or No_Normalization) |
omit_query_gaps | bool | No | True | Omit gap positions of query sequence from all sequences for NEFF computation |
is_symmetric | bool | No | True | Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) |
non_standard_option | NonStandardOption (Enum) | No | NonStandardOption.AsStandard | Enum to handle non-standard residues of the specified alphabet (AsStandard, ConsiderGapInCutoff, ConsiderGap) |
depth | int | No | inf (consider the whole sequence) | Depth of MSA to be used in NEFF computation (starting from the first sequence) |
gap_cutoff | float | No | 1 (no gappy position) | Threshold for considering a position as gappy and removing that (between 0 and 1) |
pos_start | int | No | 1 (the first position) | Start position of each sequence to be considered in NEFF (inclusive) |
pos_end | int | No | inf (consider the whole sequence) | Last position of each sequence to be considered in NEFF (inclusive) |
MSA length: 102
MSA depth: 2048
NEFF of entire MSA: 34.8453
NEFF of individual MSA: 49.2787
The tool will detect a single repetition of the individual MSA of the chain within the given MSA file and report the NEFF value for it. If the provided MSA is not in the format of a multimer MSA of a homomer (i.e., the same MSA file repeated 'n' times given in 'stoichiom'), the tool will raise an error.
MSA length: 175
MSA depth: 3072
NEff of entire MSA: 3.2836
NEFF values: {'paired': {'neff': 3.02721, 'depth': 1024}, 'A': {'neff': 4.37843, 'depth': 1024}, 'B': {'neff': 3.0194, 'depth': 1024}}
The tool will identify paired MSA sequences and the sequences for each individual MSA of chains in the given MSA file, based on the given lengths in 'chain_length'. It will report NEFF values for the paired MSA as well as for the MSAs corresponding to unpaired sequences. If the provided MSA is not in the format of a multimer MSA of a heteromer, the tool will raise an error.
compute_residue_neff
The method accepts the following parameters:
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
file | list [string] | Yes | N/A | Path to the input file containing the multiple sequence alignment (MSA) |
alphabet | Alphabet (Enum) | No | Alphabet.Protein | Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA) |
check_validation | bool | No | False | Validate the input MSA file based on alphabet or not |
threshold | float | No | 0.8 | Similarity threshold for sequence weighting, must be between 0 and 1 |
norm | Normalization (Enum) | No | Normalization.Sqrt_Length | Enum to specify normalization method for NEFF (Sqrt_Length, Length, or No_Normalization) |
omit_query_gaps | bool | No | True | Omit gap positions of query sequence from all sequences for NEFF computation |
is_symmetric | bool | No | True | Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric) |
non_standard_option | NonStandardOption (Enum) | No | NonStandardOption.AsStandard | Enum to handle non-standard residues of the specified alphabet (AsStandard, ConsiderGapInCutoff, ConsiderGap) |
depth | int | No | inf (consider the whole sequence) | Depth of MSA to be used in NEFF computation (starting from the first sequence) |
gap_cutoff | float | No | 1 (no gappy position) | Threshold for considering a position as gappy and removing that (between 0 and 1) |
pos_start | int | No | 1 (the first position) | Start position of each sequence to be considered in NEFF (inclusive) |
pos_end | int | No | inf (consider the whole sequence) | Last position of each sequence to be considered in NEFF (inclusive) |
MSA length: 56
MSA depth: 5
Per-residue (column-wise) NEFF:
[0.668153, 0.534522, 0.668153, 0.534522, 0.668153, 0.668153, 0.668153, 0.668153, 0.534522, 0.534522, 0.534522, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.534522, 0.534522, 0.534522, 0.668153, 0.400892, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.668153, 0.534522, 0.534522, 0.534522, 0.534522, 0.534522, 0.400892, 0.534522, 0.534522, 0.534522, 0.400892, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631, 0.133631]
Median of per-residue (column-wise) NEFF: 0.668153
convert_msa
The method accepts the following parameters:
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
in_file | string | Yes | N/A | Path to the input MSA file that needs to be converted |
out_file | string | Yes | N/A | Path where the converted MSA file will be saved |
alphabet | Alphabet (Enum) | No | Alphabet.Protein | Enum to specify the type of sequences in the MSA (Protein, RNA, or DNA) |
check_validation | bool | No | True | Whether to check the validation of the MSA file before conversion |
**** All example MSAs used here can be found in the 'MSA' folder of Github repository.
**** All Python scripts used in the examples can be found in the 'example' folder of Github repository.
For further assistance or inquiries, please contact the developer or create an issue in the GitHub repository.