NEFFy
NEFF Calculator and MSA File Converter
Loading...
Searching...
No Matches
MSA Formats

Table of Contents


A2M

Each sequence is represented by two lines:

  • The first line starts with > followed by the sequence identifier and some other remarks.
  • The second line contains the aligned residues; Alignments are shown with:
    • Inserts as lower case characters,
    • Matches as upper case characters,
    • Deletions as ' - ', and
    • Gaps aligned to inserts as ' . '

Example

>T1152
MY...TVKPGDT......MWKIAV...K..YQI...GI.....SEIIAANPQIKNPNLIYPGQKINIPNILEHHHHHH
>MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1
TY...D-KDGYR......HYRTRV...Y..YTL...RR.....NEDNALIA-REVFSQVYKKEAL-CPIA--------
>ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1
G-...EREKGR-......--HSKS...R..QEK...GF.....KEKK---P-TKKPSATNKPVNTAKPAA--------
>tr|A0A235B7N0|A0A235B7N0_9BACL Uncharacterized protein OS=Paludifilum halophilum OX=1642702
EAsavDRITSDSilenfvQWIFSE...E..KEVeekHT.....EESVQPTPAVKHSPDSSGSSKSSSSD---------
>tr|A0A1E5LFN5|A0A1E5LFN5_9BACI Uncharacterized protein OS=Bacillus solimangrovi OX=1305675
SA...KVKRGRT......FIPLRSateSfgYDV...IWkenenAVYLKSNPTIKPKDSTQ------------------



A3M

It is almost like A2M format. The only difference is that Gaps aligned to inserts ('.') can be excluded, and one could view the A3M format as a more method method for representing an MSA compared to FASTA or A2M.

Example

>T1152
MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIPNILEHHHHHH
>MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1
TYD-KDGYRHYRTRVYYTLRRNEDNALIA-REVFSQVYKKEAL-CPIA--------
>ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1
G-EREKGR---HSKSRQEKGFKEKK---P-TKKPSATNKPVNTAKPAA--------
>tr|A0A235B7N0|A0A235B7N0_9BACL Uncharacterized protein OS=Paludifilum halophilum OX=1642702
EAsavDRITSDSilenfvQWIFSEEKEVeekHTEESVQPTPAVKHSPDSSGSSKSSSSD---------
>tr|A0A1E5LFN5|A0A1E5LFN5_9BACI Uncharacterized protein OS=Bacillus solimangrovi OX=1305675
SAKVKRGRTFIPLRSateSfgYDVIWkenenAVYLKSNPTIKPKDSTQ------------------



Fasta

Sequences are separated by '>'. The remaining lines before next '>' contain the aligned sequence, which:

  • lower and upper case are equivalent;
  • ' . ' and ' - ' are equivalent. Aligned => Sequences have the same length.

Example

>T1152
MY---TVKPGDT------MWKIAV---K--YQI---GI-----SEIIAANPQIKNPNLIYPGQKINIPNILEHHHHHH
>MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1
TY---D-KDGYR------HYRTRV---Y--YTL---RR-----NEDNALIA-REVFSQVYKKEAL-CPIA--------
>ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1
G----EREKGR---------HSKS---R--QEK---GF-----KEKK---P-TKKPSATNKPVNTAKPAA--------
>tr|A0A235B7N0|A0A235B7N0_9BACL Uncharacterized protein OS=Paludifilum halophilum OX=1642702
EASAVDRITSDSILENFVQWIFSE---E--KEVEEKHT-----EESVQPTPAVKHSPDSSGSSKSSSSD---------
>tr|A0A1E5LFN5|A0A1E5LFN5_9BACI Uncharacterized protein OS=Bacillus solimangrovi OX=1305675
SA---KVKRGRT------FIPLRSATESFGYDV---IWKENENAVYLKSNPTIKPKDSTQ------------------



STO (Stockholm)

It consists of:

  • A header line containing format and version information.
  • Mark-up lines that start with "#=GF," "#=GC," "#=GS," or "#=GR."
  • Alignment lines featuring the sequence name and its corresponding aligned sequence. Within these lines:
    • Inserts are represented as lowercase characters,
    • Matches are indicated by uppercase characters, and
    • Gaps are denoted by either ' . ' or ' - '.

Additionally, the "//" line indicates the end of the alignment. Sequences in this format are divided into segments of 200 characters.

Example

# STOCKHOLM 1.0
#=GF ID T1152
#=GS T1152
#=GS MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1
#=GS ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1
#=GS tr|A0A235B7N0|A0A235B7N0_9BACL Uncharacterized protein OS=Paludifilum halophilum OX=1642702
#=GS tr|A0A1E5LFN5|A0A1E5LFN5_9BACI Uncharacterized protein OS=Bacillus solimangrovi OX=1305675
T1152 MY---TVKPGDT------MWKIAV---K--YQI---GI-----SEIIAANPQIKNPNLIYPGQKINIPNILEHHHHHH
MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1 TY---D-KDGYR------HYRTRV---Y--YTL---RR-----NEDNALIA-REVFSQVYKKEAL-CPIA--------
ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1 G----EREKGR---------HSKS---R--QEK---GF-----KEKK---P-TKKPSATNKPVNTAKPAA--------
tr|A0A235B7N0|A0A235B7N0_9BACL EASAVDRITSDSILENFVQWIFSE---E--KEVEEKHT-----EESVQPTPAVKHSPDSSGSSKSSSSD---------
tr|A0A1E5LFN5|A0A1E5LFN5_9BACI SA---KVKRGRT------FIPLRSATESFGYDV---IWKENENAVYLKSNPTIKPKDSTQ------------------
//



ALN

It only consists of aligned sequences, each on a separate line, and the initial sequence is gap-free.

Example

MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIPNILEHHHHHH
TYD-KDGYRHYRTRVYYTLRRNEDNALIA-REVFSQVYKKEAL-CPIA--------
G-EREKGR---HSKSRQEKGFKEKK---P-TKKPSATNKPVNTAKPAA--------
EADRITSDSQWIFSEEKEVHTEESVQPTPAVKHSPDSSGSSKSSSSD---------
SAKVKRGRTFIPLRSSYDVIWAVYLKSNPTIKPKDSTQ------------------



CLUSTAL

Clustal is commonly associated with the Clustal series of programs for sequence alignment. The Clustal MSA format typically begins with a header line that provides information about the alignment.

Following the header, Clustal format represents each sequence as a pair of columns in a line.

  • The first column contains the sequence name or identifier,
  • The second column contains the aligned sequence, which:
    • Gaps are shown as '-'
    • Matches are shown as uppercase letters.

Sequences in this format are divided into segments of 60 characters.

Example

Generated CLUSTAL format
T1152 MY---TVKPGDT------MWKIAV---K--YQI---GI-----SEIIAANPQIKNPNLIY
MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1 TY---D-KDGYR------HYRTRV---Y--YTL---RR-----NEDNALIA-REVFSQVY
ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1 G----EREKGR---------HSKS---R--QEK---GF-----KEKK---P-TKKPSATN
tr|A0A235B7N0|A0A235B7N0_9BACL EASAVDRITSDSILENFVQWIFSE---E--KEVEEKHT-----EESVQPTPAVKHSPDSS
tr|A0A1E5LFN5|A0A1E5LFN5_9BACI SA---KVKRGRT------FIPLRSATESFGYDV---IWKENENAVYLKSNPTIKPKDSTQ
T1152 PGQKINIPNILEHHHHHH
MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1 KKEAL-CPIA--------
ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1 KPVNTAKPAA--------
tr|A0A235B7N0|A0A235B7N0_9BACL GSSKSSSSD---------
tr|A0A1E5LFN5|A0A1E5LFN5_9BACI ------------------



PFAM

It's similar to Clustal in the sense that it separates sequence identifiers and sequences with a tab, but unlike Clustal, the sequences are not uniformly indented. Additionally, it doesn't split sequences into 60-character segments, as Clustal does. Also, it does not contain header line.

Example

T1152 MY---TVKPGDT------MWKIAV---K--YQI---GI-----SEIIAANPQIKNPNLIYPGQKINIPNILEHHHHHH
MTBAKSStandDraft_2_1061841.scaffolds.fasta_scaffold367497_1 TY---D-KDGYR------HYRTRV---Y--YTL---RR-----NEDNALIA-REVFSQVYKKEAL-CPIA--------
ETNvirnome_2_130_1030620.scaffolds.fasta_scaffold104244_1 G----EREKGR---------HSKS---R--QEK---GF-----KEKK---P-TKKPSATNKPVNTAKPAA--------
tr|A0A235B7N0|A0A235B7N0_9BACL EASAVDRITSDSILENFVQWIFSE---E--KEVEEKHT-----EESVQPTPAVKHSPDSSGSSKSSSSD---------
tr|A0A1E5LFN5|A0A1E5LFN5_9BACI SA---KVKRGRT------FIPLRSATESFGYDV---IWKENENAVYLKSNPTIKPKDSTQ------------------



For further assistance or inquiries, please contact the developer or create an issue in the GitHub repository.

Footer