Basic protein sequence analysis

4/4/2023

Many graphical representations of DNA and protein primary sequences have been proposed. The smallest Euclidean distance or correlation angle is the more similar. A similarity/dissimilarity analysis is then done using these descriptors by evaluating Euclidean distance or correlation angle among them. Graphical representations are usually accompanied by numerical characterization and then a descriptor to describe each protein sequence. Graphical representation approaches are one of them. Īlignment-free approaches overcome the limitations of alignment-based methods. A wide range of scoring systems has been proposed such as amino acid substitution scoring matrices PAM and BLOSUM for protein alignment. Alignment-based methods are computationally difficult with multiple sequence alignments at the same time. BLAST and FASTA are the most widely used applications. Some algorithms do global alignment or local alignment. Alignment-based methods assign scores to different possible alignments, picking the alignment with the highest score. Sequence comparison can be classified into alignment-based methods and alignment-free methods. There are many tools and techniques that provide the sequence comparisons. Thus, sequence analysis can be used to assign function to genes and proteins by the study of the similarities between the compared sequences. Comparing these new sequences to those with known functions is a key way of understanding the biology of an organism. The rate of addition of new sequences to the databases is increasing exponentially. Proteins with similar sequences usually have similar structures. The importance of similarity/dissimilarity of biological sequences returns to its relationship with the structures and functions. Sequence comparison is used to study structural and functional conservation and evolutionary relations among the sequences. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach. A cross-grouping comparison is produced to ensure the singularity of each group. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. In our work, a representative of each of three groups of protein sequences is introduced. Although this matrix is clear, it measures the degree of similarity among sequences individually. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. The number of sequences related to any group is susceptible to be increased every day. Sequence data are grouped in terms of biological relationships. Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences.

0 Comments

Basic protein sequence analysis

Leave a Reply.

Author

Archives

Categories