In silico phylogenetic, physicochemical, and structural characteristics of phytase enzyme from ten Aspergillus species

Phytic acid is a chemical compound consisting of inositol and phosphoric acid and is an antinutrient compound found in monogastric poultry feed ingredients made from cereal crops. Phytase hydrolyzes phosphoester bonds in phytic acid, releasing inorganic phosphate and phosphate esters. Aspergillus is a genus of molds that produce phytase and has been widely used in phytase production because they are easy to culture. This study aims to compare the structures,


Abstract
Phytic acid is a chemical compound consisting of inositol and phosphoric acid and is an antinutrient compound found in monogastric poultry feed ingredients made from cereal crops.Phytase hydrolyzes phosphoester bonds in phytic acid, releasing inorganic phosphate and phosphate esters.Aspergillus is a genus of molds that produce phytase and has been widely used in phytase production because they are easy to culture.This study aims to compare the structures, physicochemical characteristics, and phylogenetic relationships of phytases from several species of Aspergillus in silico as an initial screening step in obtaining the most suitable phytase to be used in poultry feed.Phylogenetic trees were constructed using MEGA 11 and physicochemical characteristics were analyzed using ProtParam.Protein structures were modeled with AlphaFold.The phytase structures were then docked with phytic acid using the YASARA Structure.The results showed that phytase 1QFX from Aspergillus niger, P34755 from A. awamori, and D5HQ11 from A. ficuum have very high similarity in terms of phylogenetics, sequences, physicochemical characteristics, and protein structures.The docking results from the three phytase structures showed that phytase 1QFX has the most negative ΔG value and the lowest Kd, which indicated the highest affinity to the phytic acid substrate.This research concludes that among the three phytase structures that have been compared and docked with phytic acid, phytase 1QFX from A. niger is the most suitable to be applied to poultry feed.
[Keywords: phytic acid, molecular docking, structure modeling, superpose] Grains from legume and cereal groups are widely used as raw materials in animal feed production.The use of grains in animal feed provides many advantages because they contain complete nutrients such as protein, carbohydrate, fat, crude fiber, Fe, K, and Ca that meet the daily nutritional needs of animals, stimulate growth, and thus improve farm productivity.In addition, grains are easy to obtain due to the large supply from agricultural businesses (Purnamasari et al., 2018).However, the use of grain as raw material in animal feed production has a drawback due to the phytic acid content in plant seeds (Silvia et al., 2016).
Grains contain a high level of phytic acid, a compound with antinutrient properties (Cominelli et al., 2020).It can reduce the digestibility of feed in livestock.Phytic acid (myo-inositol 1,2,3,4,5,6hexakiphosphate) is a chemical compound with the formula C ! H "# O $% P ! and is composed of six phosphate groups bound to an inositol structure.Phytic acid is the main storage form of phosphate in plants together with its precursors and derivatives.According to Rasyid et al. (2017), phytic acid is an anti-nutritional factor that is often found in grains which can disrupt or eliminate the use of feed ingredients, affect the physical and physiological condition of animals, and tend to have detrimental effects on animal productivity.Phytic acid has a nutrient removal mechanism because it is an inhibitor of various digestive enzymes and binds to various monosaccharides, complex peptides, and metallic minerals, thereby reducing the overall feed's nutritional availability.The high content of phytic acid in feed can cause phosphorus deficiency in livestock (Yanuartono et al., 2019).
One way to eliminate phytic acid content in animal feed is to use the phytase enzyme.Phytase or myoinositol hexakisphosphate phosphohydrolase (EC 3.1.3.8.) is a hydrolase enzyme belonging to the phosphatase group.This enzyme catalyzes the breaking of the phosphoric ester bond between the phosphate group and inositol in phytic acid, releasing myo-inositol and inorganic phosphate (Figure 1).In addition, phytase can also produce products in the form of low myo-inositol phosphatase.Phytase is an extracellular enzyme (Nurhikmah, 2017).The phytate hydrolysis activity catalyzed by phytase causes the loss of the chelating properties and strong binding ability of phytate.This eliminates the antinutrient properties of phytate making the enzyme potential to be used in animal feed preparation to increase nutritional content.Phytase is grouped into 4 groups based on its phytate hydrolysis mechanism, namely histidine acid phytase, protein tyrosine phytase, purple acid phytase, and β-propeller phytase.Histidine Acid Phytase (HAPhy) is a phytase that has optimum hydrolytic activity in acidic conditions and is found in many animals, plants, and several microorganisms (Lei et al., 2013).
One group of microbes that produce phytase Aspergillus is suitable to be applied in poultry feed to increase its nutritional value.Aspergillus is widely distributed in both tropical and subtropical areas and has a very high species abundance in nature.In addition, various Aspergillus species have also been used by humans, especially in the conventional fermented food industry such as soy sauce, miso, and tauco (Daugelaite et al., 2013).According to Mizana et al. (2016), Aspergillus sp. has a high sporulation rate and is easy to culture in the laboratory, making it easier to use in phytase enzyme production on a laboratory scale.It is therefore necessary to conduct studies to compare the physicochemical characteristics among phytases from several Aspergillus species to screen the most suitable enzyme for the feed industry.
This research focuses on in silico characterization of Aspergillus.Previous research has proven the ability of Aspergillus isolates to produce phytase to hydrolyze phytic acid.However, limited studies have been carried out on the comparison of phytase structure and characteristics from various Aspergillus species.The present research aims to compare the Figure 1.Hydrolysis of phytic acid by phytase (Coban et al., 2017) Gambar 1. Hidrolisis asam fitat oleh fitase (Coban et al., 2017) phylogenetic relationships, physicochemical and structural characteristics of phytases from several commonly cultured Aspergillus species in silico.The analyses include gene level and primary structure analysis, prediction and comparison of threedimensional protein structures of phytases followed by molecular docking with phytic acid.This study is an initial screening step to select the most suitable phytase for application in the poultry feed industry.

Tools and materials
The tools used in this research were a laptop with HP 14s-dk0073AU specifications with an AMD A4-9125 Dual-Core processor and 4 GB DDR4-1866 RAM, Windows 10 Home Single Language 64 operating system, Molecular Evolutionary Genetics Analysis software version 11.0 (MEGA11), Discovery Studio Visualizer 2017 R2 Client, LigPlot+, YASARA Structure, PyMOL, PROCHECK, ProtParam, Clustal Omega, and AlphaFold.

Signal peptide removal
A total of five phytase sequences were downloaded from the RCSB PDB (https://www.rcsb.org/)and ten others were downloaded from UniProtKB (https://www.uniprot.org/).The fifteen gene sequences of these proteins were downloaded from the EMBL-EBI bank site.Next, FASTA file preparation is carried out by determining and removing the signal peptide at the beginning of each protein sequence.Signal peptide determination was carried out using SignalP 6.0 (https://services.healthtech.dtu.dk/services/SignalP-6.0/)(Gómez et al., 2017).

Phylogenetic analysis
The initial alignment was carried out using protein BLAST (basic local alignment search tool) or BLASTp from NCBI (https://www.ncbi.nlm.nih.gov/) to determine the level of similarity between protein sequences.After obtaining the BLAST results, the alignment was followed by multiple sequence alignment (MSA) analysis to obtain data related to similarities between the phytase sequences.MSA analysis was carried out using ClustalW platform on Clustal Omega (https://www.ebi.ac.uk/Tools/ msa/clustalo/).All FASTA protein sequences were then used to build the phylogenetic tree using MEGA11 software (Mizana et al., 2016).

Prediction of physicochemical character
Physicochemical characteristics analysis of phytase was carried out on the Expert Protein Analysis System (ExPASy) Molecular Biology Server portal.The analysis was carried out using ProtParam tool (https://web.expasy.org/protparam/)(Pramanik et al., 2018).

Protein structure modeling and comparison
Protein structure modeling was carried out to model the phytase protein structure before molecular docking.The modeled structures are from the top three phytases with the highest similarity in terms of phylogenetics, sequence alignment, and physicochemical parameters, namely 1QFX, D5HQ11, and P34755.The structure modeling was carried out using AlphaFold (https://alphafold.ebi.ac.uk/).The quality of the modeled structures was then analyzed using Ramachandran plots obtained from PROCHECK site (https://saves.mbi.ucla.edu/)(Irfanuddin, 2018).
A comparison of the structure models of 1QFX, D5HQ11, and P34755 was carried out using PyMOL software (https://pymol.org/).PyMOL was used to perform superimpose analysis.The root mean square deviation (RMSD) value of all atoms is used as a parameter to determine the quality of the structure models.In addition, the aligned positions of the structures were also being compared and visualized (Akdel et al., 2020).

Molecular docking of phytase
Molecular docking of phytase enzyme and phytic acid test ligand was carried out using YASARA Structure software.The protein file in *.sce format as a result of the preparation was opened and the docking was targeted at the active sites of the known phytase proteins, namely Arg62, His63, Arg66, Asp75, Arg156, Glu272, His318, and Asp319.Molecular docking was done by moving the doc_run.mcrfile to the working folder and editing the file 25 times during the docking process.The output files (in *.yob and *.txt format) containing the binding free energies, dissociation constants, and amino acid residues data were used as the basis for further analysis.Visualization and analysis of amino acid residue interactions in the 2D molecular docking results were carried out using LigPlot+ software.PyMOL software was used to analyze the interaction between ligand and protein and to visualize the position of the ligand within the protein binding site (Ramdani et al., 2019).

Alignment and phylogenetic analysis
The phytase protein sequences used in this research were downloaded from UniProt and PDB databases.A total of 15 phytase sequences were screened to determine the phytase with the best characteristics (Table 1).The 15 samples were chosen from ten species of Aspergillus widely cultured in laboratories.It is necessary to carry out a sequence alignment to compare similarities and residues in each selected enzyme sequence.Sequence alignment analysis is the comparison of two or more DNA or protein sequences to determine the level of similarity among the sequences.One way to analyze the similarities among protein or DNA sequences is by using BLAST, a bioinformatics tool closely linked to NCBI databases.BLAST is used to search for proteins or DNA entries which sequences are similar to our sequence of interest.BLAST protein (BlastP) is specifically used to search for similar proteins in NCBI protein databases (Karunasekera, 2013).
The alignment of BlastP results using 1QFX phytase from A. niger as the query protein showed that it had an exact similarity to P34755 from A. awamori (Figure 1).The P34755 sequence also has the highest query cover and percent identity score.The D5HQ11 sequence from A. ficuum also has the same query cover and good percent identity to 1QFX and P34755.The lowest percent identity (24.22%) is shown by A0A1L9VZ88 from A. glaucus.The BlastP utilizes several other parameters such as percent identity and query cover to determine the level of protein homology.Percent identity represents the similarity of the query and the target sequences (how many characters in each sequence are identical).The higher the percent identity score, the more similar the query sequence is to the target sequence.Query cover denotes how much of the query sequence is overlapped by the target sequence.The query cover is 100% when the target sequence covers the entire query sequence.This provides the user with sequence length information relative to each other.Thus, a higher query cover indicates that the target sequence is more similar to the query sequence (Pratiwi et al., 2020).Multiple sequence alignments of the 15 phytase sequences showed similarity (homology) in all sequences (Figure 1).The red letters indicate the conserved residues of Aspergillus phytase, while the yellow highlighted ones indicate the conserved residues of phytase from all living organisms.The catalytic site of the enzyme is the RHRRHD sequence motif consisting of residue Arg, His, Arg, Arg, His, and Asp.Substrate binding sites were identified in the form of homologous Asp and Glu residues in all sequences.Similar residues, known as conserved domains, are several residues gathered in one region and are shared in proteins with similar functions.These domains are found in proteins with similar functions even though they come from different species and generally share similar functional activities such as catalytic mechanisms.Conserved domains are a characteristic that may indicate the relationship of similar proteins from different organisms (Li et al., 2019).The conserved sequences in Aspergillus phytases play an important role as the main domain because they act as catalytic sites and provide the same catalytic activity in phytic acid hydrolysis (Wang et al., 2021).
Molecular phylogenetic analysis is a method that is used to determine the relationship among species through statistical comparison of their DNA sequences (Yu et al., 2017).In phylogenetic analysis, a phylogenetic tree is constructed to visualize the evolutionary relationship of sample organisms based on their sequence similarities (Tindi et al., 2017).Poisson model with pairwise deletion method is used to construct the phylogenetic tree.The goal of this model is to correctly identify trees at neighbor positions and produce branches that is as close to the original data as possible.A close relationship to the parent (ancestor) based on branch length, distance scale and node parameters is denoted by the branch length of 0.0, a small distance value, and a closer relationship between the parents.The gene sequences are combined to construct the predicted tree branches and calculate the branch length, which is the most accurate method for trees with short branches (Pangestika et al., 2015).
The phylogenetic analysis of 15 phytase protein sequences resulted in a rectangular phylogenetic tree with two main branches at the ancestral node (Figure 2).The first branch consists of 10 sequences and the

Prediction of physicochemical characteristics
Physicochemical characteristics are typical characteristics of a protein determined by the arrangement, composition and conformation of its constituent amino acid residues (Bintang et al., 2020).The physicochemical characteristics of a protein influence its macroscopic physical appearance and molecular activity.These characteristics are the main parameters that determine the direction of production and application of a protein.Thus, determining the physicochemical characteristics is an important part in the selection and optimization processes of proteins and enzymes.Some physicochemical characteristics of proteins that are commonly determined are isoelectric point, hydrophobicity, optical properties, density, solubility and toxicity (Gómez et al., 2017).Although some physicochemical characteristics can only be determined through experiments, most characteristics can be predicted by utilizing protein sequence alignment and physicochemical property prediction softwares (Kostrewa et al., 1999).
Predicted physicochemical parameters of 15 Aspergillus phytases using ProtParam showed different results (Table 2).However, it was observed that 1QFX had identical physicochemical characteristics to P34755 sequence.In addition, the properties of these two phytases are almost similar to D5HQ11.This result is consistent with the fact that the three phytases are very closely related and therefore have similar physicochemical characteristics.The highest number of residues was observed in phytase A0A1L9VZ88 (525 residues) while the lowest was observed in 1IHP (438 residues).In general, the number of residues is positively correlated with molecular weight.According to Bintang et al. (2020), the increase in protein molecular weight is an indication of good protein thermostability due to the The stability index indicates the approximate stability of the protein in vitro.A protein with a stability index above 40 indicates that the protein is less stable.Based on the results, 1IHP, 3K4Q, and B3VPB2 are unstable due to their stability index being greater than 40.The aliphatic index is a number that indicates the volume of protein filled with long-chain amino acids.The aliphatic index cannot be used as a reference for determining protein parameters (Yu et al., 2017).
Isoelectric pH or pI is the pH at which a protein in a solution precipitates because it loses the ability to bind water (Azhari et al., 2017).pI generally does not have much difference from the optimum pH, therefore the ideal phytase for poultry feed should have a pI that ranges between 3.47 -6.43 to adapt to the pH of animal's digestive tract.1QWO and 1SKB are must be avoided because they have a pI of 7.04 that is in the weak base range.On the other hand, both 1QFX and P34755 had a pI of 4.51 while D5HQ11 had a pI of 4.45.These three phytases have a pI in the range of 3.47 -6.43 and therefore are suitable for application in poultry feed.
GRAVY or grand average of hydropathicity is the sum of regular hydropathy of a protein.The GRAVY value of a protein is a measure of its hydrophobicity or hydrophilicity.The two measures are combined in a hydropathy scale or hydropathy index.Hydropathy values range from -2 to +2 for most proteins, with positively charged proteins being more hydrophobic (Kaur et al., 2020).The results showed that none of the 15 phytases tested has a GRAVY value above +1.

The structure of 1QFX, D5HQ11 and P34755
Structural modeling was carried out to predict the three-dimensional structure of phytases which empirical structures are not yet available in the RCSB PDB database.The modeling is carried out using the fold recognition principle to obtain more accurate results by recognizing domains and folds in the actual structure.AlphaFold is an AI system developed by DeepMind that predicts a protein's 3D structure from its amino acid sequence.AlphaFold greatly improves the accuracy of structure prediction by incorporating novel neural network architectures and training procedures based on the evolutionary, physical, and geometric constraints of protein structures (Jumper et al., 2021).The modeled phytase structure is the structure with the highest similarity and is suitable for the poultry digestive tract.1QFX from A. niger, D5HQ11 from A. ficuum, and P34755 from A. awamori are the three structures that have the highest similarity, so it is necessary to know their protein structures for further analysis.The structure of 1QFX is available on the RCSB PDB website (Figure 3).Meanwhile, the structures of D5HQ11 and P34755 are not yet available and therefore need to be modeled using AlphaFold.
The 1QFX phytase is a homodimeric protein with two identical polypeptide chains, namely chains A and B. Each chain consists of 460 amino acid residues with several attached ligands such as N-acetylglucosamine and mannose, although these two ligands do not play a role in the catalytic process.Therefore, it is necessary to remove these ligands in molecular docking analysis (Wang et al., 2021).
The modeled structure is different from the structure determined by direct crystallography.Modeled structures generally do not contain natural ligands because structural modeling programs only focus on amino acid sequences as a structure modeling template (Jumper et al., 2021).The structure of D5HQ11 modeled with AlphaFold consists of four homotetramer chains where each chain consists of 460 residues (Figure 4A).There is no natural ligand attached to the modeled structure.A similar result was also observed in the P34755 modeled structure (Figure 4B).
The three structures were then compared using a PyMOL.Superpose was conducted on 1QFX, D5HQ11 and P34755 (Figure 5).The structure comparison results show that the three proteins have a high similarity on each domain, although they have small differences in physicochemical characters.The active site residues were also shown on the same position as it was shown on the result.The parameters used to determine the protein quality and homology was the RMSD or root mean square deviation.This value is the result of the comparison atom positions in residues that make up the protein structure of 1QFX, D5HQ11 and P3755.A good RMSD is smaller than 2 Å as it shows a high compatibility among atoms in the structures (Wang et al., 2021).The three structures have an RMSD value of 0.557 Å, indicating a substantial similarity.The structural quality of the protein modeled with AlphaFold can be determined using the Ramachandran plot.Ramachandran plot is a diagram constructed from the distribution of amino acid residues at two types of angles; the phi (ϕ) in the xaxis and the psi (ψ) in the y-axis (Schmidt et al., 2013).The Ramachandran plot is divided into four quadrants: the most favored regions (quadrant I) in red, additional allowed regions (quadrant II) in yellow, generously allowed regions (quadrant III) in pale yellow, and disallowed regions (quadrant IV) in white.While the Ramachandran plot for D5HQ11 and P34755 were modeled using Procheck, the Ramachandran plot of 1QFX can be downloaded from RCSB PDB.
The analysis results showed that 1QFX had 90.3% residues in quadrant I, 9.2% residues in quadrant II, 0.5% residues in quadrant III, and 0% residues in quadrant IV (Figure 6A).The D5HQ11 had 89.8% residues in quadrant I, 9.6% residues in quadrant II, 0.3% residues in quadrant III, and 0.3% residues in quadrant IV (Figure 6B).Meanwhile, the Ramachandran plot of P34755 shows that this enzyme had 89.8% residues in quadrant I, 9.7% residues in quadrant II, 0.4% residues in quadrant III, and 0% residues in quadrant IV (Figure 6C).A good quality model or structure is expected to have more than 80% residues in the most favored region and less than 1% non-glycine residues in disallowed regions (Peter et al., 2017).The 3D structure of 1QFX obtained from RCSB PDB was a good structure.The 3D models of D5HQ11 and P34755 also have a good quality because there are more than 80% residues in the most favored region.

Visualization of molecular docking
Molecular docking was carried out using phytic acid (CID 890) attached to 1QFX, D5HQ11, and P34755.The molecular docking carried out in this study is a targeted docking in which the test ligand or substrate is directly attached to the active site of the enzyme (Figure 7).This is because the modeled protein does not have a natural ligand.The gridbox creation and test ligand docking were carried out directly on the active site known from previous analyses (Suhadi et al., 2013).Active site residues in 1QFX are Arg62, His63, Arg66, Asp75, Arg156, Glu272, His318 and Asp319.Residues Asp75 and Glu272 act as substrate specificity sites, while other residues act as catalytic sites (Wang et al., 2021).These residues are conserved and have the same position and role in D5HQ11 and P34755 enzymes.
Molecular docking method is based on the formation of a complex molecular conformational structure formed in ligand or substrate and protein interaction.The stability of the resulting complex molecule conformation was then analyzed (Meyer et al., 1998).This study used YASARA Structure as the docking software because it has the advantage of having an energy minimization feature.The docking parameters that can be analyzed with YASARA include the affinity energy and dissociation constant.A good docking results in protein-ligand binding with a high affinity energy and low dissociation constant (Agistia et al., 2013).
The docking results in the form of binding points for the phytic acid substrate on each residue are then compared with the active site of the phytase enzyme (Table 3).The docking analysis results show that 1QFX has a substrate binding site that is most identical to the active site of the protein with 6 binding points.D5HQ11 has a binding point of 4 residues that are identical to the active site.P34755 only has 3 identical active sites.The comparison results also show that the residue Arg62 and Arg66 have a substrate binding function in all three enzymes.This is because Arg62 and Arg66 are conserved catalytic residues in phytase (Kostrewa et al., 1999).Visualization of the docking results using PyMOL shows the binding pocket that binds to phytic acid in each enzyme.The phytase enzyme has a long pocket with 2 small holes at the entrance and a narrow middle part.The phytic acid molecule is shown as a stick structure (Figure 8).Based on the visualization results, the binding pocket is similar for each enzyme, especially at the catalytic site.The catalytic sites were far from Asp75 and Glu272, the substrate specificity residues.All ligands are visible on the surface of the binding pocket in all enzymes.

A B C
A B C

Energy affinity and dissociation constant
The result of molecular docking is a log file containing affinity energy and dissociation constant values (Table 4).The binding energy indicates the interaction strength between a protein and its ligand in the form of affinity energy (ΔG).A lower affinity energy indicates a spontaneous binding process that forms a stable and strong bond between the ligand and protein.Conversely, the greater the affinity energy, the more unstable the conformation and bonding are (Abusham et al., 2019).On the other hand, dissociation constant (Kd) quantifies the equilibrium between a ligand being free in solution and bound to the protein.The smaller constant indicates high substrate affinity to protein (Erna et al., 2016).

A B C
The affinity energy of 1QFX, D5HQ11, and P34755 enzyme are -7.99 kcal/mol, -6.8 kcal/mol, and -6.46 kcal/mol, respectively.1QFX has the most negative affinity energy and therefore it has the strongest affinity to phytic acid.Similar results were also shown for the Kd constant with the results for 1QFX, D5HQ11, and P34755 enzyme being 1.3940 μM, 10.3299 μM, and 18.3263 μM, respectively.1QFX has the lowest Kd value means that it has the strongest affinity to the phytic acid (Dermawan et al., 2013).In summary, phytase 1QFX from A. niger has the strongest affinity and binding to phytic acid.
Phytase has been used in animal nutrition for two decades as a sustainable mineral source and became a standard ingredient in feed for monogastric animals.In swine and poultry, phytase addition has had positive impacts on the availability of minerals such as phosphorus, calcium, and zinc.Phytase addition also reduces the fecal excretion of phosphorus.Phytases are added to around 70% of the monogastric animal diet (Troesch et al., 2013).In human studies, the use of phytase to date has mainly focused on the benefit of iron and zinc absorption. A. niger provided the very first industrial phytase, which was categorized as a 3phytase (Ningsih et al., 2017).This research was focused at in silico analyses of phytases.In silico protein analysis is uses computational tools to simplify analysis that previously had to be carried out using more complicated steps.The analyses are generally carried out as an initial screening step to determine the protein structure characteristics to be studied further.The use of computational tools will produce data to support laboratory experiments related to protein characterization (Madden et al., 2020).

Conclusion
Phytase enzymes with original accession code 1QFX from A. niger, D5HQ11 from A. ficuum, and P34755 from A. awamori have a fairly high level of homology and structural similarity.The two enzymes are also similar in terms of phylogeny, have a relatively small evolutionary distance, and similar physicochemical parameters.The docking results in the form of ∆G and Kd show that 1QFX has the strongest affinity and binding with its substrate phytic acid.In conclusion, the phytase enzyme from A. niger is the most suitable for production and application in poultry feed.