![]() |
![]() |
|
DomainParser method description
DomainParser uses a top-down approach to domain decomposition implemented using a graph theoretical approach. Each residue is modeled as a node and each connection between residues as an edge. A connection between two residues exists either when a residue lies next to another in sequence or is in close proximity in 3D structure. The spatial proximity between two residues requires that the distance between at least one atom from each residue is 4Å or less. The strength of connection (referred to as capacity) between two nodes is proportional to the strength of the interaction between two residues represented by the nodes.
The division of the protein into domains is done by systematically splitting a structure into two parts, which is equivalent to separating a network into two parts using a minimum cut approach. The process of division is then repeated with each individual domain until one of the stop criteria is reached. This constitutes the first step of the algorithm. In the second step (a post-processing step) a number of parameters are used to evaluate the suitability of potential domains generated in step 1. The parameters include compactness, radius of gyration, number of non-contiguous segments per domain and the distribution of domain sizes. The minimum length of a domain is 35 residues, and beta-strands are not cut unless they act as a narrow polypeptide segment connecting two or more domains. DomainParser demonstrates the highest propensity among the four automatic methods toward undercutting. That is, predicting fewer domains than predicted by experts. The least problematic structures are large α-class structures such as large orthogonal bundles and up-down bundles as well as large structures in the α/β class, such as complex structures, horse-shoe and 3-layer sandwiches (view table). The main problem is the failure to continue successful partitioning after the first round, that is, subdividing resulting domains further. Size and compactness of the domain appear as the over-riding factors. DomainParser can partition rather complex architectures correctly as long as the resulting domains are either large or very compact (Figure 1A, 1B). We suspect that β-strand interactions are contributing greatly to this problem (method has a good success rate for the α-class structures). We also observe that one β-class architecture - that of immunoglobulin-like sandwich - is particularly difficult for DomainParser Figure 1C, 1D, 1E). This rather simple architecture may epitomize the β-structure issues for this method. DomainParser performs the most extensive evaluation of the potential domains it assigns. Its failure to further partition large domains may be due to a bias in the post-processing step during which small domains are evaluated and are either granted the status of domains or joined together. Since the multiple criteria used in the post-processing step were trained using SCOP, it is likely that parameters set to favor large domains will undercut relative to other expert methods just as SCOP tends to do. This will affect the distribution of expected sizes as well as the distribution of the number of fragments used in the post-processing step. Another issue might be improper tuning of β-stand cutting; DomainParser prefers not to cut β-strands between two domains; thus it often keeps large units with interacting β-strands together (Figure 1C, 1D, 1F, 1G). All this points to the possibility that multiple decision-making factors in the post-processing step are not tuned correctly. At the same time DomainParser is the closest among the algorithms to correctly predict the level of domain fragmentation, when compared to expert methods; it also has the most precisely assigned domain boundaries among the four algorithmic methods. ![]() Figure 1. Domain assignment by DomainParser method. |
This work is sponsored by the National Institutes of Heath (NIH) Grant Number GM63208 (NIH/NIGMS)